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Objectives  and  Status 

This  project  had  a  start  date  of  September  1,  1993  and  ended  on  31  August,  1996.  The  funding  levels 
were  $  148,495  for  the  six  month  period  9/1/93-2/28/94,  $  298,515  for  3/1/94-2/28/95,  $  303,987 
for  3/1/95-2/28/96,  and  funding  of  $  154,905  for  the  final  six  months  3/1/96-9/1/96.  The  PI  for  the 
project  is  Professor  Tony  Devaney  and  the  other  principles  are  Professors  Ram  Raghavan,  Hanoch 
Lev-Ari,  Elias  Manolakos,  and  Mitch  Kokar  all  from  the  Department  of  Electrical  and  Computer 
Engineering  at  Northeastern  University 

The  overall  objective  of  the  project  is  to  develop,  test  and  evaluate  wavelet  based  algorithms 
for  automatic  target  detection  and  identification.  The  work  and  results  obtained  can  be  roughly 
broken  down  into  three  sub-areas: 

•  Wavelet  based  scale  recursive  and  scale  sequential  algorithms  for  HRR  and  SAR, 

•  Algorithms  and  Architectures  for  fast  and  efficient  computation  of  Discrete  Wavelet  Trans¬ 
forms, 
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•  Wavelet  based  sensor  fusion  algorithms. 


Major  Accomplishments 

•  ATR  scale  sequential  and  recursive  algorithms  have  been  developed  for  high  range  resolution 
radar  (HRR)  that  operate  at  near  optimum  performance  levels  with  computation  times  orders 
of  magnitude  below  conventional  fine  scale  maximum  likelihood  based  algorithms.  These 
algorithms  have  been  extended  to  the  detection  of  electronic  signals  in  ELINT  operations. 
Professor  Devaney  has  presented  this  work  at  a  number  of  ARPA  sponsored  workshops  as 
well  as  at  other  meetings  such  as  at  a  special  workshop  on  ATR  held  at  Phillips  Laboratory 
in  Albuquerque  in  the  summer  of  1995.  He  also  presented  a  three  hour  tutorial  on  this  and 
related  work  at  the  National  Security  Agency  in  February  1996. 

•  Multi-resolution  based  ATR  algorithm  has  been  generalized  to  cases  where  a  target  to  be 
detected/recognized  is  described  only  in  terms  of  the  various  scales  that  make  up  the  target 
signature.  Knowledge  of  the  actual  target  signature  within  each  scale  is  not  required  in 
this  formulation.  The  approach  has  several  features  that  make  it  attractive  for  the  ATD/R 
application:  (i)  Test  for  target  presence  in  each  scale  associated  with  the  signal  subspace 
is  carried  out  independently,  (ii)  the  false  alarm  probability  for  each  scale  may  be  specified 
independently  and  (iii)  the  approach  offers  a  greater  degree  of  protection  against  missing  a 
target  since  the  exact  signature  of  the  target  is  not  required  in  the  algorithm.  Professor 
Raghavan  has  presented  this  and  related  work  at  Lincoln  Laboratories  in  March  of  1996  and 
at  Rome  Laboratories  at  Hanscom  AFB  in  September  1995. 

•  An  analytical  framework  for  optimal  design  of  MSE  (Wiener)  filters  in  the  multiresolution 
(subband)  domain  has  been  developed  that  exhibits  excellent  cost-performance  tradeoff,  and 
is  capable  of  achieving  significant  reduction  in  implementation  cost  with  only  a  minor  degra¬ 
dation  in  performance.  The  design  is  based  on  the  notions  of:  (i)  scale-recursiveness,  (i)  sparse 
estimation,  and  (ii)  optimized  front-end  tree-structured  filter  bank  configurations  and  it  has 
been  shown  that  the  key  to  efficient  cost-performance  tradeoff  when  using  sparse  estimation 
is  the  reduction  of  overlap  between  adjacent  channels  of  the  filter  bank.  This  can  be  achieved 
by  a  suitable  selection  of  the  prototype  filter  bank  response. 

•  M.  Kokar  and  students  developed  a  feature  selection  method  based  on  model  theory  have 
implemented  this  scheme  in  the  Automatic  Feature  Based  Recognition  System  (AFBRS). 
The  method  has  been  tested  on  simulated  ATR  dataand  the  performance  was  compared  with 
the  recognition  method  described  by  Coifman  and  Saito.  In  these  tests  the  method  showed 
better  recognition  accuracy  without  introducing  additional  computational  cost.  We  have  also 
obtained  reduced  dimensionality  of  input  data  through  model-theory  based  feature  selection. 

•  The  synthesis  of  parallel  computational  structures  for  the  Discrete  Wavelet  Transform  (DWT), 
starting  from  a  thorough  and  systematic  algorithmic  data- dependence  analysis  was  investi¬ 
gated  by  Prof.  Elias  S.  Manolakos  and  graduate  students.  Improving  algorithm  performance 
through  parallelism  has  been  addressed  from  two  points  of  view:  By  optimizing  aggregate 
throughput  in  a  series  of  problem  instances  by  means  of  modular  VLSI  architectures  having  a 
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small  number  of  processing  elements  (PEs);  and  by  means  of  scalable  algorithms  for  parallel 
supercomputers  having  a  large  number  of  PEs  and  a  fixed  interconnection  network,  where 
the  object  is  to  minimize  execution  time  of  a  single,  potentially  very  large  problem  instance 
typical  of  SAR  image  processing.  A  family  of  modular  application  specific  VLSI  architectures 
was  derived  for  the  DWT  as  well  as  a  family  of  parallel  DWT  algorithms  for  Linear,  Mesh, 
and  Hypercube  processor  networks.  All  parallel  algorithms  have  been  evaluated  in  terms  of 
their  theoretical  and  practical  scalability  properties. 

•  Model-Based  Target  Recognition  was  investigated  within  the  framework  of  parameter  estima¬ 
tion  of  hierarchical  mixture  densities  by  Professor  Elias  S.  Manolakos  and  graduate  students. 
An  appropriately  modified  version  of  the  Expectation-Maximization  (EM)  algorithm  was  in¬ 
troduced  that  generates  estimates  of  the  posterior  probability  of  target  presence  in  an  image. 
The  EM  was  extended  to  multiple  level  mixture  hierarchies  in  order  to  deal  with  target  trans¬ 
lation  and  scale  invariance.  The  approach  can  account  for  the  non-stationary  statistics  of  the 
input  image  as  well  as  different  noise  models,  two  aspects  that  are  very  important  for  ATR. 
Our  simulation  results  suggest  that  the  scheme  works  very  well  even  in  the  case  of  partially 
occluded  targets  presented  at  scales  for  which  templates  are  not  available. 

•  A  large  number  of  journal  and  conference  papers  have  been  produced  detailing  the  work 
performed  on  this  project.  In  addition,  five  M.S.  theses  have  been  produced  and  another  M.S, 
and  three  doctoral  theses  are  in  progress. 

Publications 

•  A.J.  Devaney  and  K.  Riley,  “Scale  sequential  processing  of  HRR  data  for  target  recognition”, 
in  preparation. 

•  K.  Riley  and  A.J.  Devaney,  “Wavelet  processing  of  images  for  target  detection”.  International 
Journal  of  Imaging  Systems  and  Technology  7,  p. 404-420,  1996. 

•  A.J.  Devaney  and  B.  Hiscomez,  “Wavelet  signal  processing  for  radar  target  identification:  a 
scale  sequential  approach,”  in  Wavelet  Applications,  Harold  Szu,  editor,  Proc.  SPIE  22f2^ 
389-399,  1994. 

•  R.  S.  Raghavan  et.al,  “Performance  of  the  GLRT  for  Adaptive  Vector  Subspace  Detection”, 
IEEE  Trans,  on  Aerospace  and  Electronic  Systems,  1996  (to  appear). 

•  R.  S.  Raghavan  et.al,  “Adaptive  detection  of  signals  in  spherically  invariant  noisef’ IEEE 
Trans,  on  Signal  Processing,  1996,  (in  Review) 

•  J.  Fridman,  E.S.  Manolakos,  “Discrete  Wavelet  Transform  on  a  SIMD  massively  parallel 
platform”,  Proc.  of  the  International  Conference  on  Signal  Processing  Applications  and  Tech¬ 
nology,  (ICSPAT’95),  Boston,  1995. 

•  J.  Fridman,  E.S.  Manolakos,  “On  the  Scalability  of  2-D  Discrete  Wavelet  Transform  Al¬ 
gorithms”,  Multidimensional  Systems  and  Signal  Processing,  (special  issue  on  wavelets  and 
multiresolution  signal  processing),  to  appear  1996. 
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•  H.  Liu  and  H.  Lev-Ari,  “Scale-Recursive  Optimal  Filtering,”  CDSP  Technical  Report  No. 
TR-CDSP-94-25,  Northeastern  University,  Dec.  1994. 

•  H.  Liu  and  H.  Lev-Ari,  “Optimal  Filtering  in  the  Subband-Domain,”  Proceedings  of  the  IEEE 
International  Conference  on  Acoustics,  Speech  and  Signal  Processing,  Detroit,  MI,  May  1995. 

•  Z.  Korona  and  M.M.  Kokar,  “Target  recognition  using  wavelet-based  features,”  in  Proc.  SPIE 
AeroSense  Conference:  Wavelets,  Orlando,  FL,  .April  1996. 

•  Z.  Korona  and  M.M.  Kokar,  “Model-based  fusion  for  multisensor  target  recognition,”  in  Proc. 
SPIE  AeroSense  Conference:  Image  Exploitation  and  Target  Recognition,  Orlando,  FL,  April 
1996. 
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1.  Devaney— Scale  Sequential  HRR 


In  this  Chapter,  M-ary  hypothesis  testing  in  the  wavelet  domain  is  detailed.  The  application  of 
interest  is  target  identification  from  large  data  base  HRR  radar  where  the  hypotheses  are  the  set  of 
targets  with  various  orientations  and  the  signal  used  for  classification  is  the  measured  HRR  return. 
Decisions  are  made  in  a  sequential  fashion  by  starting  at  a  coarse  wavelet  scale  and  proceeding 
to  finer  scales.  As  the  wavelet  scale  is  increased,  hypotheses  are  ruled  out  according  to  a  pre¬ 
defined  decision  criterion.  Such  an  approach  is  computationally  efficient  and  well  suited  to  detection 
problems  with  large  signature  data  bases. 

The  following  sections  present  two  forms  of  a  scale  sequential  detection  algorithm.  The  first 
form  is  optimal  in  the  sense  that  it  achieves  the  same  error  rate  performance  as  the  matched  filter 
detector  implemented  at  finest  scale  but  with  significant  savings  in  computational  time.  The  second 
form  is  a  sub-optimal  algorithm  that  is  capable  of  achieving  even  greater  computational  savings 
with  only  a  marginal  decrease  in  error  rate  performance. 


1..1  Hypothesis  Testing  in  the  Wavelet  Domain 

In  order  to  perform  hypothesis  testing  in  the  wavelet  domain,  a  wavelet  based  interpretation  of 
the  log  likelihood  function  is  required.  To  develop  such  a  quantity,  consider  the  conventional  log 
likelihood  function 

L{k)  =  jdi  |r(()  -  (1) 

where  r[t)  is  a  measured  signal  (e.g.,  a  HRR  return)  and  Sk{t)  a  hypothesis  (e.g.,  member  of  the 
signature  data  base).  For  a  sufficiently  large  wavelet  scale  J  (high  Nyquist  sampling  rate),  the 
Nyquist  sample  values  r(n)  and  Sk{n)  become  equal  to  the  scaling  coefficients  at  scale  J.  We  can 
thus  approxmate  the  measured  return  and  data  base  signal  in  a  scaling  function  expansion  of  the 
general  form 

n  n 

where  (P'1  are  the  scaling  functions  at  scale  J.  Using  the  above  representation  for  both  r{t)  and 
Sk{i)  in  Eq.(l),  the  log  likelihood  ratio  becomes 

L{k)=  [  dt  E  r;;  4,i(t)  -  E  si(k)  4,i(t)\\  (3) 

''  n  n 

which,  after  making  use  of  the  orthonomality  property  of  the  scaling  functions  reduces  to 

L{k)  =  L^lk)  =  Elr^  -  4(k)?-  W 

n 

In  Eq.(4)  the  log  likelihood  function  has  been  completely  expressed  in  terms  of  wavelet  scaling 
coefficients  at  scale  J.  Using  a  similar  approach,  the  log  likelihood  function  can  be  expressed  as  a 
function  of  the  scaling  coefficients  and  wavelet  coefficients  at  scale  J  —  1.  In  particular,  we  find  that 

L(k)  =  Lpk)  =  E[’n■‘-“^'(n]"+Ek^‘-”'»^'(*^)^ 

n  n 

=  L-’-\k)  +  wL'^-^{k).  (5) 
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By  induction,  the  log  likelihood  function  can  be  decomposed  to  an  arbitrary  wavelet  scale  Jg  and 
written  as 

j-\ 

L\k)  =  Y.  wU'{k)  +  L^^{k)  (6) 

j'  =  Jo 

where 

£■'•(*:)  =  El'-i”  -  m 

n 

wL^'{k)  =  (S) 

n 

The  above  expression  for  the  log  likelihood  function  will  be  used  to  perform  hypothesis  testing  in 
the  wavelet  domain.  The  following  section  details  a  computationally  efficient  means  for  computing 
Eci(6)  for  large  signature  databases. 

1..2  Scale  Sequential  Hypothesis  Testing 

The  motivation  for  interpreting  the  M-ary  detection  problem  in  the  wavelet  domain  is  to  reduce 
the  computational  cost  of  making  a  decision.  For  a  large  signature  database,  an  optimal  algorithm 
which  computes  Eq.(l)  for  every  hypothesis  will  exact  a  large  computational  cost.  Such  a  cost  will 
reduce  the  rate  at  which  decisions  can  be  made.  An  attractive  feature  of  Eq.(8)  is  that  the  range 
of  the  summation  decreases  geometrically  with  decrecising  scale  j.  This  feature  can  be  exploited 
by  initially  testing  hypotheses  at  a  low  wavelet  scale  and  sequentially  proceeding  to  higher  scales 
according  to  a  decision  criterion.  By  ruling  out  unlikely  hypotheses  at  low  wavelet  scales,  the  “full” 
log  likelihood  function  (i.e.  L'^{k))  is  computed  over  a  subset  of  the  signature  database  and  a 
computational  savings  is  realized. 

In  short,  the  scale  sequential  detection  scheme  can  be  described  as  follows.  For  a  signature 
database  with  K  hypotheses,  Eq.{7)  is  computed  for  the  entire  signature  database.  A  decision 
criterion  is  employed  to  rule  out  hypotheses  from  consideration  at  the  scale  Jo  +  1.  The  test  continues 
to  scale  Jo  +  1  where  Eqs.(6),  (7),  and  (8)  are  used  to  compute  L'^°'^^{k)  for  only  those  hypotheses 
that  tested  positive  at  scale  Jg.  Similarly,  the  test  proceeds  in  a  sequential  fashion  to  higher  wavelet 
scales.  Once  the  finest  scale  J  is  reached,  the  full  log  likelihood  function  is  computed  over  a  subset 
of  the  signature  database  resulting  in  reduced  computational  cost  and  increased  decision  rate.  The 
following  section  presents  the  decision  criterion  that  has  been  used  to  leave  behind  hypotheses  while 
sequentially  proceeding  up  wavelet  scales. 


1..3  Scale  Sequential  Testing  using  the  Stack  Algorithm 

As  described  in  the  previous  section,  once  the  likelihood  functions  are  computed  at  a  given  scale,  a 
decision  criterion  is  used  to  rule  out  or  leave  behind  hypotheses  from  consideration  at  higher  scales. 
This  section  details  the  use  of  probabilistic  decoding  as  a  decision  criterion. 

The  scale  recursive  detection  problem  can  be  viewed  in  a  probabilistic  decoding  framework  where 
the  encoding  tree  has  K  branches  {K  =  number  of  hypotheses)  stemming  from  the  root  node  and 
a  single  branch  from  all  successive  nodes.  The  depth  of  the  tree  is  equal  to  the  number  of  levels 
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ot  wavelet  decomposition.  The  encoding  tree  for  the  scale  recursive  detection  problem  is  shown  in 
figure  1. 


scale  Jq  scale  Jq+1  scale  Jq+1  scale  J-1  scale  J 

• - • - •  — • - •  path  for 

. - » - *  - ^ - ^  path  for 

Root  ^ 

Node  •  • 


path  for  H 


Figure  1:  Encoding  tree  for  the  scale  recursive  detection  problem 

The  stack  decoder  forms  a  cumulative  metric  from  a  sum  of  branch  metrics.  This  is  entirely 
analogous  to  the  scale  sequential  detection  problem  where  the  likelihood  function  is  built  up  by 
adding  the  contributions  from  the  wavelet  coefficients  at  each  scale.  The  scale  sequential  detection 
problem  can  now  be  summarized  as  follows: 

1.  Compute  the  wavelet  decomposition  of  the  received  signal. 

2.  Compute  L'^°{k)  for  each  hypothesis  and  form  a  sorted  stack. 

3.  Using  (6)  compute  U'^^{k)  for  the  hypothesis  having  the  smallest  L^{k)  and  replace  L^{k) 
with  L^'^'‘-{k)  in  the  stack. 

4.  Repeat  step  3  until  a  hypothesis  reaches  the  finest  scale  and  is  minimum.  This  hypothesis  is 
chosen  as  the  correct  hypothesis. 

With  the  algorithm  detailed,  the  choice  of  stack  metric  can  be  addressed.  In  this  report,  two 
stack  metrics  are  considered.  These  two  metrics  will  be  denoted  L2  and  L2noTm  and  are  stated 
below  for  a  particular  scale  and  hypothesis: 


L‘2.LrAk) 

norm  \  / 


^(r^(n)  -sj:(n))^  -  Nja^ 


(9) 

(10) 
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where  Aj  =  2^  and  cr^  is  the  variance  of  the  additive  white  Gaussian  noise  (AVVGN)  in  the  received 
signal. 

The  Ij2  metric  is  optimal  with  respect  to  error  rate  performance.  This  can  be  explained  by 
noting  that  L2^  is  non-decreasing  with  increasing  j.  Consequently  if  L2'^{k)  is,  for  some  hypothesis 
k,  minimum  compared  with  L2^(k')  for  j  ^  J  and  all  k'  it  is  also  minimum  with  respect  to  all  k' 
at  the  finest  scale  J;  i.e.,  it  remain  minimum  if  all  hypotheses  are  propagated  to  the  finest  scale. 
Testing  all  hypotheses  at  the  finest  scale  corresponds  to  optimum  M-ary  detection  which  leads  to 
the  conclusion  the  stack  algorithm  with  the  L2  metric  is  guaranteed  to  arrive  at  the  same  decision 
as  the  optimal  test. 

The  statistics  of  the  L2  metric  under  both  the  correct  and  incorrect  hypotheses  are: 
correct  hypothesis: 


mean  =  Nja"^  (11) 

variance  =  2Nja’‘^ 

incorrect  hypothesis:  (k=correct  hypoth.  and  l=hypoth.  under  test) 

mean  =  Njcr'^  H — ^  A  k  (12) 

variance  =  ‘2,Nja^  +  Aa^Di^k 

where  A.r-  is  the  Euclidean  distance  measure  defined  as 

Ou=EW(")-4(n)r-  (13) 

n=l 

From  the  statistics  of  the  L2  metric  it  is  apparent  that  as  a  hypothesis  (either  correct  or  incorrect) 
is  propagated  to  higher  scales,  the  mean  and  variance  will  tend  to  increase  with  Nj.  As  previously 
stated,  the  algorithm  will  propagate  the  smallest  cumulative  metric.  If  the  algorithm  is  propagating 
the  correct  hypothesis,  on  average  the  cumulative  metric  will  increase  and  it  is  likely  that  the  correct 
hypothesis  will  not  remain  at  the  top  of  the  stack.  This  results  in  the  algorithm  propagating  incorrect 
liypotheses  (whose  metrics  will  also  tend  to  increase)  before  returning  to  the  correct  hypothesis. 
yVlthough  the  algorithm  will  tend  to  propagate  incorrect  hypotheses  with  the  L2  metric,  under  most 
circumstances  all  incorrect  hypotheses  will  not  be  propagated  to  the  finest  scale.  This  results  in 
significant  computational  savings  while  maintaining  optimality. 

The  L2norm  metric  is  sub-optimal,  but  it  is  considered  because  it  significantly  increases  the  com¬ 
putational  savings  over  the  L2  metric  while  sacrificing  a  small  amount  of  error  rate  performance. 
The  motivation  behind  the  L2noTm  metric  is  to  have  a  stack  metric  that  remains  small  when  prop¬ 
agating  the  correct  hypothesis  and  tends  to  grow  when  propagating  an  incorrect  hypothesis.  By 
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defining  the  L2noTm  metric  as  a  zero-mean,  normalized  version  of  the  L2  metric  along  the  correct 
path  these  requirements  are  satisfied.  The  statistics  of  the  L2noTm  metric  are  stated  below 

correct  hypothesis: 


mean  =  0 
variance  =  2a^ 

incorrect  hypothesis:  (k=correct  hypoth.  and  l=hypoth.  under  test) 


mean  =  —==zDi  k 


(14) 


(15) 


M  "-tLf 

variance  =  2cr  -f 

Nj 

where  Dk,i  is  defined  as  before.  As  required,  under  the  correct  hypothesis  the  L2noTm  metric  has 
a  zero  mean  and  constant  variance.  This  results  in  the  cumulative  for  the  correct  hypothesis 
remaining  close  to  zero  (on  average)  and  therefore  at  or  near  the  top  of  the  stack.  Conversely, 
under  the  incorrect  hypothesis,  the  L2norm  metric  has  a  mean  which  will  diverge  from  zero.  The 
variance  will  also  diverge  but  not  as  fast  as  the  mean  since  the  Dk,i  term  varies  as  1/Aj  as  opposed 
to  \l in  the  expression  for  the  mean.  As  a  result,  the  algorithm  will  quickly  dismiss  an  incorrect 
hypothesis  from  the  top  of  the  stack.  This  facilitates  the  search  for  the  correct  hypothesis  and  once  it 
is  found  the  algorithm  will  quickly  propagate  it  to  the  finest  scale  and  finish.  This  means  that  fewer 
hypotheses  are  propagated  than  would  be  with  the  L2  metric  resulting  in  further  computational 
savings. 


l.,4  Complexity  and  Implementation 

The  detection  schemes  that  the  stack  algorithm  will  be  compared  to  in  the  following  section  are 
strictly  arithmetic.  The  stack  algorithm  on  the  other  hand  uses  both  arithmetic  and  comparison 
operations.  The  arithmetic  operations  come  from  computing  the  stack  metric  while  the  comparisons 
come  from  forming  and  updating  the  stack  (i.e.  sorting  the  stack  and  placing  new  stack  metrics  in 
their  proper  locations).  In  order  to  develop  a  fair  measure  of  complexity  for  the  stack  algorithm, 
both  of  these  operations  must  be  considered.  In  this  paper,  the  mea.sure  of  arithmetic  complexity 
is  taken  to  be  FLOPs/decision.  A  single  FLOP  is  assigned  for  each  addition  and  multiplication. 
The  comparisons  that  are  performed  during  the  execution  of  the  algorithm  are  counted  by  assigning 
a  one  FLOP  penalty  for  each  comparison.  This  penalty  can  be  justified  by  noting  that  a  simple 
comparison  between  two  numbers  could  involve  taking  their  difference  (one  FLOP)  and  checking  for 
positivity.  With  the  measure  of  complexity  defined,  the  implementation  and  computational  analysis 
of  the  algorithm  can  be  addressed. 
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The  method  of  organizing  and  updating  the  stack  can  strongly  affect  the  complexity  of  the 
algorithm.  The  approach  taken  in  this  paper  is  to  first  sort  the  stack  (low  to  high)  at  the  coarsest 
scale.  Since  the  stack  metrics  are  computed  in  a  serial  fashion,  the  sort  consists  of  inserting  each 
new  metric  into  its  proper  position  as  the  stack  is  built  up.  The  insertions  were  made  using  a  binary 
insertion  algorithm  which  on  average  requires  log2  m  comparisons  where  m  is  the  size  of  the  array 
on  which  the  insertion  is  performed.  From  this  follows  that  the  stack  sort  at  the  coarsest  scale 
requires 

K 

^^logjm  comparisons  (16) 

m—2 

where  K  is  the  total  number  of  hypotheses  to  be  tested.  As  previously  stated,  when  propagating 
a  hypothesis,  the  new  stack  metric  (computed  at  the  next  higher  wavelet  scale)  is  placed  into  the 
stack  in  the  proper  location.  This  was  also  accomplished  using  a  binary  insertion  algorithm  thus 
requiring  log2  K  comparisons  for  each  propagation. 

Evaluating  the  number  of  computations  required  to  compute  the  stack  metric  is  straightforward. 
The  L'2  metric  requires  Nj  subtractions,  Nj  multiplications,  and  Nj  —  1  additions  resulting  in  the 
following  expression 


ZNj  —  1  FLOPs/hypothesis.  (17) 

Similarly  the  L'lnorm  metric  requires  ZNj  +  1  FLOPs/hypothesis.  Since  ZNj  is  in  most  cases  the 
dominant  term,  (17)  is  used  for  both  metrics. 

With  the  complexity  of  maintaining  the  stack  and  computing  the  stack  metric  detailed,  a  lower 
bound  on  computational  complexity  can  be  derived.  The  lower  bound  will  correspond  to  the  case 
where  the  algorithm  immediately  propagates  the  correct  hypothesis  to  the  finest  scale  and  finishes 
(i.e.  no  incorrect  hypotheses  are  propagated).  Using  (16)  and  (17)  this  lower  bound  can  be  written 
as 


K  J-l 

r,  =  A'[3-2'^°-l]+ ^log2m+  [3-2'"-l]  +  (J- J<,)log2/v  (18) 

m=2  m=Jo 

where 

Jo  =  the  coarsest  wavelet  scale 
J  =  the  finest  wavelet  scale 
K  =  the  total  number  of  hypotheses. 

The  first  two  terms  on  the  right  hand  side  of  (18)  reflect  the  cost  of  computing  the  stack  metrics 
at  the  coarsest  scale  and  forming  the  stack  itself.  The  last  two  terms  on  the  right  hand  side  of 
(18)  reflect  the  cost  of  immediately  propagating  the  correct  hypothesis  to  the  end  and  finishing. 
The  variability  of  the  computational  effort  makes  it  difficult  to  derive  a  closed-form  expression  for 
the  expected  number  of  computations.  The  analysis  performed  to  date  has  been  via  Monte  Carlo 
simulation  and  is  presented  in  the  following  section. 
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1..5  Scale  Sequential  Stack  Algorithm 


This  section  details  the  performance  of  the  scale  sequential  stack  algorithm.  The  performance  has 
been  evaluated  using  a  synthetic  signature  database  and  received  signals  formed  by  embedding  an 
arbitrary  signature  from  the  database  in  AWGN.  The  performance  of  the  stack  algorithm  has  been 
computed  via  Monte  Carlo  simulation  and  is  compared  with  two  other  detection  schemes.  The 
comparisons  are  based  on  error  rate  performance,  computational  complexity,  and  computational 
savings. 

The  first  algorithm  that  is  used  for  comparison  is  optimum  M-ary  detection.  This  algorithm  is 
optimal  with  respect  to  error  rate  performance.  A  decision  is  formed  by  evaluating  (9)  at  the  finest 
scale  (i.e.  using  all  signal  samples)  for  all  hypotheses  and  choosing  the  hypothesis  associated  with 
the  smallest  metric.  Using  (17),  the  computational  complexity  of  the  algorithm  can  be  expressed 
as 


Topt  =  K[3-2-^ -1]  FLOPS  (19) 

where  K  and  J  are  defined  as  before.  The  second  algorithm  that  is  used  for  comparison  employs  the 
wavelet  decomposition  as  described  in  this  paper  with  a  different  decision  criterion.  The  decision 
criterion  is  a  pruning  algorithm  which  is  summarized  by 

Reject  Hk  if  L^ik)  < 

Accept  Hk  if  H{k)  >  Hrncdian 

where  lj{k)  is  the  log-likelihood  at  scale  j  and  ^median  median  log-likelihood  at  scale  j. 

All  hypotheses  whose  log-likelihoods  fall  below  the  median  at  a  given  scale  are  removed  from 
consideration  at  finer  scales.  Using  this  decision  criterion  and  (17),  the  computational  complexity 
of  the  pruning  algorithm  is  given  by 

Tprune  =  A' [3  •  2-^0  +  3(J  -  Jo)2'^'>-'  -  1]  FLOPs.  (20) 


1..6  Performance  with  Synthetic  Database 

The  error  rate  performance  has  been  evaluated  for  the  stack  algorithm  (both  the  L2  and  L2norm 
metrics)  and  the  two  algorithms  detailed  above.  The  signature  database  was  formed  from  random 
variables  uniformly  distributed  over  the  interval  [0,1].  The  received  signal  was  formed  by  adding 
AWGN  to  one  of  the  signatures  in  the  database.  For  the  data  to  follow,  a  library  of  200  hypotheses 
was  used  with  a  signal  length  equal  to  256  samples.  The  performance  has  been  computed  via  Monte 
Carlo  simulation  where  the  seed  of  the  random  number  generator  was  varied  to  insure  no  correlation 
during  the  creation  of  the  database  and  between  Monte  Carlo  iterations.  The  error  rate  curves  are 
presented  in  Figure  2.  As  expected,  the  stack  algorithm  with  the  L2  metric  and  the  optimal  M-ary 
test  have  an  identical  error  rate  performance.  The  stack  algorithm  with  the  L2noTm  metric  sacrifices 
some  error  rate  performance  because  of  its  sub-optimality.  Additionally,  Figure  2  shows  that  the 
stack  algorithm  with  either  metric  is  superior  in  error  rate  performance  to  the  pruning  algorithm. 
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The  average  number  of  FLOPs /decision  is  defined  as  the  average  number  of  FLOPs  an  algorithm 
reciuires  to  choose  a  hypothesis.  This  quantity  has  been  obtained  using  the  same  Monte  Carlo 
simulations  used  to  compute  the  error  rates.  Consequently,  for  the  data  to  follow  the  simulation 
parameters  are  unchanged.  Figure  3  displays  the  average  FLOPs/dccision  for  the  three  algorithms 
under  consideration.  Once  again,  both  the  L2  and  L2noTm  metrics  are  considered  for  the  stack 
algorithm.  The  stack  algorithm  with  the  L2  metric  offers  a  computational  savings  over  the  optimal 
maximum  likelihood  test  for  signal-to-noise  ratios  (SNRs)  above  -5dB.  These  savings  become  quite 
large  above  5dB  and  the  maximum  savings  occur  above  20dB  where  the  algorithm  converges  to  its 
lower  computational  bound  (Eq.(18))  and  outperforms  the  pruning  algorithm.  The  stack  algorithm 
with  the  L2noTm  metric  offers  dramatic  computational  savings  over  the  optimal  M-ary  test  for  the 
entire  range  of  SNRs  considered.  As  expected  the  savings  are  greater  than  those  of  the  L2  metric 
and  the  algorithm  converges  to  the  lower  computational  bound  faster  (at  approx.  10  dB). 

The  maximum  computational  savings  offered  by  the  stack  algorithm  can  be  best  visualized  be 
plotting  the  quantity 


FLOPs/decision  for  algorithm  under  consideration 
FLOPs/decision  for  optimal  M-ary  test. 


(21) 


This  quantity  is  easily  computed  for  the  both  the  stack  and  pruning  algorithms  using  (18),  (19), 

(20)  and  will  be  denoted  by  Rstack  and  Rprune  respectively.  For  the  stack  algorithm,  the  lower 
computational  bound  derived  earlier  is  used  for  Rstack-  R  should  be  noted  that  when  evaluating 

(21 ) ,  Rstack  is  dependent  on  both  K  and  J  while  Rprune  is  dependent  on  J  only  (Jo  is  assumed  fixed). 
Figures  4  and  5  display  both  Rstack  and  Rprune  for  a  particular  choice  of  parameters.  In  Figure  4, 
the  number  of  hypothesis  is  held  constant  {K  =  10^)  while  J  is  varied.  The  stack  algorithm  is  seen 
to  offer  greater  computational  savings  than  the  pruning  algorithm  for  all  J  tested.  In  Figure  5, 
the  observation  length  is  held  constant  (J  =  8)  while  K  is  varied.  The  stack  algorithm  is  seen 
to  outperform  the  pruning  algorithm  for  hypothesis  sets  with  sizes  in  the  range  [10^,  10®]  which  is 
a  practical  range  for  implementation.  A  more  complete  comparison  between  Rstack  and  Rprune  is 
given  in  tables  1  and  2.  The  data  has  been  rounded  to  facilitate  comparison.  From  tables  1  and  2, 
the  stack  algorithm  is  seen  to  always  outperform  the  optimal  test  (i.e.  Rstack  >  !)•  Additionally, 
the  stack  algorithm  outperforms  the  pruning  algorithm  (i.e.  Rstack  >  Rprune)  for  a  practical  range 
of  test  parameters. 
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prob.  of  error 


Error  Rate  Curves 


Figure  2:  Error  rate  curves  for  the  algorithms  under  consideration 
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FLOPs/decision 
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Figure  3:  FLOPs/decision  for  the  algorithms  under  consideration 
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Computational  savings  versus  observation  length,  K=10''3 


Figure  4:  Computational  savings  versus  J 
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Computational  savings  versus  number  of  hypotheses,  Jmax=8 


Figure  5:  Computational  savings  versus  K 
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Table  1;  Rstack  as  a  function  of  K  and  J 
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38 

33 

9 

9 
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76 

65 

10 

10 

74 

184 

179 

152 

131 

11 

10 

85 

311 

352 
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12 

10 

92 

474 
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606 
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Table  2:  Rprune  as  a  function  of  K  and  J 

J 

K  =  2^ 
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K  =  2' 

2 

1 
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1 

1 

1 

1 

3 

2 

2 

2 

2 

2 

2 

4 

3 

3 

3 

3 

3 

3 

5 

6 

6 

6 

6 

6 

6 

6 

10 

10 

10 

10 

10 

10 

7 

17 

17 

17 

17 

17 

17 

8 

30 

30 

30 

30 

30 

30 

9 

53 

53 

53 

53 

53 

53 

10 

96 

96 

96 

96 

96 

96 

11 

176 

176 

176 

176 

176 

176 

12 

323 

323 

323 

323 

323 

323 
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2.  A  Multi-resolution  based  target  detection  algorithm  — 
R.  S.  Raghavan 

A  multi-resolution  based  automatic  target  detection/recognition  algorithm  decribed  in  our  earlier 
reports  has  been  generalized  to  cases  where  a  target  to  be  detected  or  recognized  is  described  only 
in  terms  ot  the  various  scales  that  make  up  the  target  signature.  Knowledge  of  the  actual  target 
signature  within  each  scale  is  not  required  in  our  formulation.  The  rationale  behind  this  approach 
is  that  target  characteristics  formulated  in  this  fashion  are  somewhat  invariant  to  shifts  in  position 
and/or  orientation  of  the  target  and  therefore  the  approach  provides  some  degree  of  robustness 
in  ensuring  that  targets  are  not  missed  due  to  mismatch  between  the  actual  target  signature  and 
that  used  in  the  ATR  algorithm.  In  our  approach,  target  detection  in  clutter  is  carried  out  using 
a  generalized  likelihood  ratio  test  that  detects  a  signal  contained  in  a  known  subspace  (the  signal 
subspace  is  specified  completely  by  the  scale  information)  embedded  in  clutter  whose  covariance 
properties  within  each  scale  and  across  scales  is  unknown.  The  clutter  covariances  within  each  scale 
and  across  scales  are  estimated  using  clutter  data  from  target-free  regions.  Analytical  e.xpressions 
for  characterizing  the  detection  and  false  alarm  probabilities  of  the  generalized  test  have  been 
derived.  These  results  are  further  generalized  to  the  case  where  the  target  signature  is  modeled  as 
a  random  vector  but  confined  to  the  known  subspace.  The  proposed  scale-subspace  ATR  algorithm 
may  he  simplified  further  if  prior  knowledge  allows  one  to  neglect  clutter  covariances  across  scale. 
In  this  case,  the  proposed  approach  has  several  features  that  make  it  attractive  for  the  ATD/R 
application:  (i)  Test  for  target  presence  in  each  scale  associated  with  the  signal  subspace  is  carried 
out  independently,  (ii)  the  false  alarm  probability  for  each  scale  may  be  specified  independently,  (iii) 
the  approach  offers  a  greater  degree  of  protection  against  missing  a  target  since  the  exact  signature 
of  the  target  is  not  required  in  the  algorithm  and  (iv)  a  sequential  implementation  of  the  algorithm 
may  be  implemented  readily  by  starting  with  the  hypothesis  test  at  a  scale  in  the  signal  subspace 
with  the  lowest  estimated  clutter  power  and  proceeding  with  the  test  towards  scales  with  increasing 
estimated  clutter  power  levels. 


2..1  A  Summary  of  Results  from  Previous  Work 

The  following  section  summarizes  the  main  results  reported  in  earlier  reports  for  the  sake  of  estab¬ 
lishing  continuity  with  the  work  reported  in  the  next  section.  Our  notations  are  briefly  introduced. 
These  notations  were  introduced  in  an  earlier  report  [1]  and  is  given  here  so  that  the  following 
document  is  complete.  Let  represent  a  complex  Af-dimensional  vector  obtained  from  a  given 
resolution  cell  under  test.  The  superscript  J  indicates  the  resolution  level  of  the  received  data 
(larger  the  value  of  index  J,  the  higher  the  resolution).  We  assume  that  A- vectors  from  a  set  of  K 
secondary  resolution  cells  or  reference  cells  which  have  the  same  correlation  properties  as  the  test 
cell  vector  under  the  clutter  only  hypothesis  are  also  received.  These  vectors  from  the  reference 
cell  for  example  are  denoted  by  Using  the  pyrimid  algorithm  (figure  shown  in  previous  re¬ 

port)  all  the  received  vectors  are  decomposed  into  their  various  multi-resolution  components.  Thus 
the  vector  x^"^^  is  equivalently  represented  in  terms  of  the  components  x^^\  ...,  where 

is  the  detail  signal  at  resolution  level  m.  The  dimensions  of  the  various  vectors  obtained  after 
the  MRA  are  generally  different.  Since  is  a  A-dimensional  vector,  the  dimension  of  vector 
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is  i\/2,  the  vector  has  the  dimension  N/4  and  so  on.  The  last  vectors  in  the  MRA  (i.e. 

and  each  have  the  dimension  iV/2'^.  Obviously,  we  have  assumed  that  N  >  2"^  without 
loss  of  generality.  It  is  convenient  to  stack  all  the  vectors  obtained  above  in  a  single  A'^-vector  2. 
Thus 

2..  1.1  Adaptive  detection  in  vector  subspaces 

Referring  to  the  notations  introduced  in  the  previous  section,  let  z(i)  ;i  =  1,2, .. A',  (K  >  N) 
represent  a  set  of  secondary  A-vectors  which  are  assumed  to  be  mutually  independent  and  to  have 
the  same  statistical  properties  as  the  clutter  in  the  primary  A^-vector  2.  Clutter  in  the  primary 
vector  is  modeled  as  a  zero-mean  complex  Gaussian  A-vector  with  correlation  matrix  Rc.  The 
condition  K  >  A^  specified  above  is  required  in  order  to  ensure  that  the  estimated  correlation 
matrix  (with  no  specific  structure  imposed)  is  non-singular  with  probability  1. 

Let  V  denote  the  vector  space  spanned  by  the  set  of  all  complex  Af-vectors.  Let  Vi  denote  the 
signal  subspace  (the  vector  subspace  which  is  known  to  contain  the  signal  if  it  is  present).  The 
dimension  of  the  signal  subspace  is  denoted  by  A,  and  clearly  1  <  Ns  <  N.  We  define  the  subspace 
1-2  of  dimension  N  —  Ng  to  be  the  orthogonal  complement  of  Vi  in  V.  Thus,  K  fj  ^2  =  0  and 
L'l  U  ^^2  =  V-  Since  the  signal  subspace  is  known  (as  described  in  the  previous  section,  the  signal 
subspace  is  the  span  of  vectors  used  as  basis  for  the  relavent  scales  in  which  the  signal  energy  is 
primarily  confined.)  ,we  assume  that  the  components  of  all  A-vectors  have  been  arranged  such 
that  the  projection  of  a  given  A-vector  2  onto  to  the  subspace  Vj  is  Ojv_jv  ]L  The  superscript 
t  denotes  transpose,  zi  is  a  vector  with  Ng  components  and  Om  is  the  zero-vector  of  length  M. 
Thus  the  projection  of  a  given  A- vector  2  onto  the  subspace  V2  is  given  by  [0yv,)^2]^  where  22  is  a 
vector  with  A  —  A*  components.  Therefore  every  A-vector  2  in  the  problem  may  be  represented 
as  2  =  [2{,22]*-  show  in  [1]  that  the  generalized  likelihood  ratio  test  (GLRT)  for  detecting 
a  deterministic  signal  vector  Zg  =  known  to  belong  to  the  subspace  Vj  embedded  in 

zero-mean  comple.x  Gaussian  noise  is  given  by; 

_  1),  (22) 

1  T  ■2^2  *^22^  ^2 

where  the  quantity  {yo  —  1)  is  a  predetermined  threshold  which  as  discussed  below  is  obtained  from 
the  specified  probability  of  false  alarm.  In  (22)  the  Ng  dimensional  vector  21.2  is  defined  as: 

■21.2  =  [zi  —  5'i2*5'22^-22]-  (23) 

Also  in  (22)  the  superscript  f  denotes  conjugate  transpose.  Other  quantities  which  appear  in  (22) 
and  (23)  are  defined  in  terms  of  the  secondary  vector  components  2ri(f)  and  Z2{i)  \  i  =  1,2,  ...A^  as 
follows: 


X^2i(f)2}(i) 
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5.2  =  f:z.(i)4(!) 

i~\ 

521  =  5]22(04(0 

Z  =  1 

522  =  Ylz2{i)z\{i) 

i-\ 

Sx.2  =  Sn- 8,282^ S2X.  (24) 

The  main  aspects  of  the  test  may  be  summarized  as  follows.  The  secondary  vectors  provide 
estimates  of  the  clutter  correlation  and  cross-correlation  matrices  in  the  two  subspaces  Vi  and  V2 
as  shown  in  equation  (24).  These  estimated  matrices  are  needed  for  estimating  the  component  of 
the  primary  clutter  vector  in  the  signal  subspace  V\  using  the  component  of  the  primary  vector  in 
subspace  V2  which  is  known  to  be  signal-free.  The  estimated  clutter  vector  is  given  by  Zi  =  812822  ^2- 
Therefore,  zi,2  =  zi  —  Zi  represents  a  Ns  dimensional  vector  whose  clutter  level  has  been  suppressed 
through  linear  estimation  and  which  in  addition  may  contain  the  signal  to  be  detected.  Note  that 
the  signal  to  be  detected  (if  present)  remains  unaltered  through  this  processing  and  the  clutter 
suppression  attempts  to  increase  the  signal-to-clutter  ratio  in  the  subspace  Vi.  Next,  the  residual 
clutter  vector  in  the  signal  subspace  Vi  is  ‘whitened’  using  the  estimated  correlation  matrix  of  the 
residual  clutter  in  Vi.  This  matrix  may  be  shown  to  be  proportional  to  8\,2  defined  in  (24).  The 
vector  represents  the  output  of  the  whitening  filter  and  the  numerator  of  the  left  hand  side 

of  (  22)  is  the  energy  of  the  vector  output  of  the  whitening  filter.  The  GLRT  compares  this  quantity 

(a  real  and  non-negative  scalar)  to  the  quantity  (t/o  —  1)(1  +  2]  822  ^2)',  which  represents  an  adaptive 
threshold  determined  from  the  primary  and  secondary  clutter  components  in  subspace  V2.  Note 
that  the  equivalent  of  the  Adaptive  Matched  Filter  (AMF)  described  in  [5]  for  the  present  problem 
is  to  compare  the  numerator  of  (22)  with  a  fixed  threshold  instead  of  the  adaptive  threshold  used 
to  implement  the  GLRT.  A  block  diagram  summarizing  these  ideas  is  shown  in  Fig.  6 

2. . 2  Summary  of  New  Results 

The  new  work  reported  here  include  the  following; 

.4nalytical  expressions  (in  closed  form)  for  the  probability  of  detection  as  a  function  of  the  signal- 
to-clutter  ratio  for  the  subspace  algorithm. 

Generalization  of  the  results  for  the  case  where  the  target  signature  is  a  random  vector  known  to 
be  contained  in  a  given  subspace. 

2. . 2.1  Probability  of  Detection  For  Subspace  Algorithm 

In  this  section  we  obtain  analytical  expressions  for  the  probability  of  false  alarm  Pfai  and  the 
probability  of  detection  Pq  for  the  GLRT  in  (22).  We  define  the  following  quantities  for  convenience: 

y  =  z^8~^z 
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(25) 


t  c-i 

y22  =  ^2  S22  Z2 

t  C-l 

J/1.2  =  ^1.2‘^>1.2^1.2- 

We  first  obtain  the  expression  for  Po  for  the  GLRT  in  (22)  as  a  function  of  the  signal-to-interference 

ratio  defined  as:  x  =  z\ R~^ Zsi  where  z^  is  the  signal  to  be  detected.  The  probability  of  false  alarm 
is  obtained  by  setting  the  signal-to-interference  ratio  to  zero  in  the  expression  for  Pq.  Using  upper 
case  symbols  to  denote  random  variables,  we  observe  that  Y  =  ¥22  +  Yi,2  under  both  hypotheses  Ho 
and //i-  Let  the  random  variable denote  1/(1-1-722)  and  so  T  =  ¥1,2  +  —  1.  Since  7i, 2  is  a  non¬ 

negative  random  variable,  the  conditional  probability  of  7  >  (yo-l)  given  i?  =  r  is  1  if  0  <  r  <  1  /yo- 
For  1/t/o  <  r  <  1,  the  above  conditional  probability  is  P[RYi,2  >  {ryo  -  1)|R  =  ul/l/o  <  r  <  1]. 
The  probability  of  7  >  (t/o  —  1)  may  be  computed  by  removing  the  conditioning  for  the  random 
variable  R.  Thus: 


P[Y>iyo-l)\H^,x]  =  I  '°fnir)dr 

Jo 

+  /  /fi(r)P[jR7i.2  >  {ryo  -  l)\Hi,R  =  r, x]f/r. 

Jl/vn 


(26) 


The  probability  on  the  left  hand  side  was  derived  in  [2]  and  is  given  by: 


K-N+l 


P[Y>{yo-l)\Hr,X  =  x]  =  l--^  Y: 


K 


Vd  .=1 


where,  Gi{x)  is  related  to  the  incomplete  Gamma  function  r(f,.T)  by: 


x"  r(f,x) 


G.(x)  -  e  X)  - 


(28) 


.41so,  the  random  variable  R  is  Beta  distributed  under  both  hypotheses  (since  primary  vector 
components  from  only  the  subspace  72,  which  is  signal-free,  are  involved).  Thus, 


/h(^)  =  717 


K\ 


(A^_Ar  +  yV,)!(Ar-iV,-l)! 


K-N^■N> 


(1  -r) 


0  <  r  <  1. 


(29) 


We  seek  to  develop  an  expression  similar  to  that  in  [3]  for  the  probability  of  i?7i.2  >  {yo  ~  1) 
conditioned  on  hypothesis  Hi,  signal-to-interference  ratio  x  and  R  =  r.  By  analogy  with  expressions 
in  [3]  and  [4]  we  assume  that  the  required  solution  is  given  in  the  form  shown  below  but  with  the 
terms  CiGi{xr jyo)  appearing  in  the  sum  on  the  right  hand  side.  The  unknown  coefficients  Ci 
are  then  determined  by  substituting  this  expression  in  (26)  and  using  (27,28,29)  (details  of  the 
integrations  involved  are  described  in  [4]).  Thus: 


P[/?7i.2  >  (yo  -  l)|Lfi,x,i?  =  r]  = 


K-N+l 

k-n+n,  X) 
Vo  i=i 


(K-N^Ns\ 

\  iV,  +  i-l  ) 


(yo-l)^*"*”  ^Gi(xr/yo). 


(30) 
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The  probability  of  detection  Pp  is  obtained  by  averaging  the  conditional  probability  of  detection 
in  (30)  over  the  probability  density  function  of  the  random  variable  R,  given  in  (29).  For  this 
purpose,  we  first  define  the  function  A(-)  as; 

Di{x)  =  f  Gi{xr)fR{r)dr.  (31) 

Jo 

The  probability  of  detection  for  the  GLRT  is  given  by: 


^  i  V  -r 

Pd  =  K-yv+N, 

Vo  t=i 


i  V  -T 


Ns  +  i  —  1 


The  probability  of  false  alarm  Pra  is  obtained  by  setting  the  signal-to-interference  ratio  to  zero  in 
the  above  expression  ( A(0)  =  1)  and  is; 


PpA  —  1 


1  ^+1  (K-N  +  Ns 


,,1<-N+N, 

Vo  i=i 


^  E 


Ns +  i-l 


|(j/o-  1) 


,Ns+i-l 


(33) 


.4n  alternate  expression  for  the  probability  of  false  alarm  which  is  convenient  for  computation  is 
obtained  by  expanding  the  term  within  the  sum  above  in  a  binomial  series  and  is  given  by: 


Pfa  = 


)(i 

^N-l  \y. 


R-N+1 


,F,{K -N  +  l,l-N-J<-N  +  2;  j/o ‘), 


(34) 


where  A'  =  K  —  {N  —  Ns),  N  =  N  —  {N  —  Ns)  =  Ns  and  2^1(0;,  /?;  7;  y)  is  the  Gauss  Hypergeometric 
series  given  by: 


w 


here, 


a  j3-ry)  =  l  +  y 
oc,Pn,y)  1  +  ^^  J,,. 

(35) 

fc-1 

1=0 

(36) 

Note  that  when  =  1,  the  Pfa  evaluated  from  (34)  is  which  coincides  with  the  expres¬ 

sion  in  [3].  We  need  to  define  a  few  quantities  before  the  above  expressions  may  be  applied  to  the 
case  when  Ns  =  N.  When  Ns  =  N,  the  subspace  V2  has  dimension  zero.  Accordingly,  we  define 
V’1.2  =  y,  Y22  =  0  and  so  /^(r)  =  6{r  —  1).  Thus,  Di{x)  in  (31)  is  equal  to  G,(x)  and  as  a  result  the 
Pfa  evaluated  from  (34)  coincides  with  the  expression  in  [4].  We  also  note  that  the  expressions  for 
Pd  given  in  (32)  for  A^^  =  1  and  Ns  =  N  coincide  with  the  expressions  in  [3]  and  [4]  respectively. 


2. .2. 2  Probability  of  Detection  of  the  Subspace  Algorithm  for  Random  Signals 

The  probability  of  detecting  a  deterministic  signal  Zs  which  is  known  to  belong  to  the  signal  subspace 
Vi  using  the  GLRT  in  (22)  as  derived  in  the  previous  section  is  given  in  terms  of  the  signal-to- 

interference  ratio  X  —  zl Zg  as: 
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1  K-N+1  /  \ 

where  Di{.)  is  a  function  defined  in  (31).  The  above  expressions  for  Pr,\  and  Pd  are  valid  for  any 
dimension  of  the  signal  subspace  Vi  between  one  and  N  (i.e.  I  <  Ng  <  N). 

When  the  signal  to  be  detected  is  a  realization  of  a  random  vector  Z.,  confined  to  the  subspace 
V'l,  (22)  is  not  the  GLRT  for  the  detection  problem.  In  fact  if  the  covariance  matrix  of  the  signal  and 
interference  are  both  unspecified  the  GLRT  is  intractable  [7].  To  cpiantify  the  detection  performance 
of  the  test  in  (22)  for  the  random  signal  case,  we  assume  as  before  that  the  signal  vector  Zg  is 
expressed  in  the  form  [Zh  0)v_ArJL  The  vector  Zgi  is  modeled  as  a  complex,  zero-mean  Gaussian 
random  vector  of  dimension  Ng  with  covariance  matrix  Rjii-  The  probability  of  detection  for  the 
random  signal  case  is  obtained  by  averaging  the  probability  in  (37)  which  is  conditioned  on  a  given 

signal-to-interference  ratio  X  =  x  over  the  probability  density  function  of  A'  =  zl R~^Zg.  Given 
Zg  =  [Z^j  random  variable  =  zj R~^ Zg  is  also  expressed  in  terms  of  the  partioned 

interference  covariance  matrix  as  :  X  =  where  R,i.2  =  Rm  —  Rii2Ri22^i2i-  In  a 

manner  similar  to  that  described  in  [4]  it  may  be  shown  that  the  probability  density  function  of 
the  random  variable  A'’  is  determined  by  the  eigenvalues  of  the  matrix:  RgiiRj]^^-  Specifically,  if 
{A,  ;  i  —  1,2,  ...A^s},  denote  the  eigenvalues  of  the  matrix  R3uR~i  2i  fh^n  the  random  variable  X  is 
statistically  equivalent  to  the  sum:  ^iXi,  where  A,-  are  statistically  independent  exponentially 

distributed  random  variables  with  the  mean  value  of  each  random  variable  equal  to  1.  Thus, 

Pd  =  /  P[Y, . 2/{l  +  Y22)  >  {yo  -  l)\X  =x,I-R]fx{x)dx 

JO 

where, 

roo 

Di{a)  =  /  Di{otx)fx{x)dx.  (39) 

The  average  signal-to-interference  ratio  Aq,  is  defined  as  :  Aq  =  E[z] Rj^ Zg]  =  tr[RgR~^]  = 
Fr[/?snRriy  1  where  tr[  ]  denotes  the  trace  operator.  Since  the  trace  of  a  matrix  is  also  equal 
to  the  sum  of  the  eigenvalues  of  the  corresponding  matrix,  the  average  signal-to-interference  ratio  is 
also  given  by  Aq  =  A,-  =  E[X].  Therefore,  the  probability  of  detection  for  given  PfaiK^N  and 

1  <  Xg  <  N  is  not  parameterized  by  the  single  quantity  Ao-the  average  signal-to-interference  ratio, 
as  in  the  deterministic  signal  case,  but  Po  for  the  random  signal  case  depends  on  all  the  eigenvalues 
A,  ;  i  =  1,2,. .As  through  the  density  function  fx{^)  that  appears  in  (39). 

2.. 3  Summary 

In  the  previous  section  we  have  generalized  the  adaptive  subspace  detection  algorithm  (reported 
previously  in  [1]  )  to  the  case  when  the  signal  to  be  detected  is  random  but  confined  to  a  signal 
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subspace.  Analytical  expressions  for  the  detection  probability  are  derived.  To  the  best  of  our 
knowledge  these  results  are  new  to  the  field  of  adaptive  detection  theory.  A  more  detailed  version 
of  these  results  is  published  in  an  article  entitled  “Performance  of  the  GLRT  for  Adaptive  Vector 
Subspace  Detection,”  which  appears  in  the  IEEE  Transactions  on  Aerospace  and  Electronic  Systems, 
October  1996. 
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Figure  6:  Block  Diagram  of  Subspace  GLRT  Algorithm. 
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3.  Scale-Recursive  Optimal  Filtering  -  H.  Lev-Ari 


This  research  has  been  conducted  by  Professor  Hanoch  Lev-Ari  and  the  graduate  student  Hong  Liu. 
The  results  reported  here  are  extracted  from  the  reports:  (i)  H.  Liu  and  H.  Lev-Ari,  “Scale- Recursive 
Optimal  Filtering,”  CDSP  Technical  Report  No.  TR-CDSP-94-25,  Northeastern  University,  Dec. 
1994  (see  also  1995  IEEE  International  Conference  on  Acoustics,  Speech  and  Signal  Processing,  pp. 
2076-2079,  Detroit,  MI,  May  1995);  and  (ii)  H.  Liu  and  H.  Lev-Ari,  “Optimized  Filter  Banks  for  Ef¬ 
ficient  Subband-Domain  Estimation,”  CDSP  Technical  Report  No.  TR-CDSP-96-38,  Northeastern 
University,  Feb.  1996.  A  journal  article  is  currently  in  preparation. 

Multiresolution  analysis  decomposes  a  single  record  into  a  hierarchy  of  signals  at  different  scales, 
i.e.,  into  a  multichannel  configuration  (Fig.  7).  This  enables  the  application  of  efficient  channel- 
recursive  estimation  techniques  to  construct  optimal  (Wiener)  and  adaptive  filters  that  operate 
recursively  from  coarse  to  fine  scale. 


^o(m) 


6(»^) 


^L-i(m) 


Figure  7:  Subband-domain  decomposition  of  a  signal  a;(-). 

We  have  developed  [1]  an  analytical  framework  for  the  implementation  of  optimal  filters  in  the 
multiresolution  (i.e.,  subband)  domain.  Our  approach  to  scale-recursive  optimal  filtering  is  based  on 
carrying  out  the  filtering  operation  in  the  subband  domain,  using  a  selected  subset  of  the  subband- 
domain  signals.  This  approach  allows  to  trade-off  performance  for  computational  cost:  using  a 
fraction  of  the  number  of  subband  channels  we  can  achieve  estimation  errors  that  are  close  to 
the  theoretical  minimum,  and  at  a  fraction  of  the  cost  associated  with  using  the  complete  set  of 
subband  channels.  Though  some  results  on  matching  wavelet-based  filter  banks  to  specific  signals 
have  already  been  published  [2,  3,  4,  5],  previous  research  has  not  addressed  the  issue  of  cost- 
performance  tradeoff  in  subband-domain  optimal  estimation. 

To  be  more  specific,  consider  the  (standard)  problem  of  estimating  a  signal  (say,  target  signature) 
d{-)  from  a  received  (noisy)  signal  a:(-).  This  can  be  accomplished  by  the  celebrated  Wiener  filter 
[6],  which  determines  an  optimal  estimate  d{-)  via  linear  filtering  operation,  viz.,  d  =  T®x,  where 
T(z)  denotes  the  transfer  function  of  the  Wiener  filter  (Fig.  8a).  In  contrast,  we  advocate  carrying 
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out  the  filtering  operation  in  the  subband  domain  (Fig.  8b),  and  using  a  selected  subset  of  the 
subband-domain  signals  Ck{'),  viz.,  @  where 


/  ^o(m)  \ 

Ciim) 


\  J 


(40) 


Plere,  {^i(m) ;  0<i<L  — 1}  denote  the  subband  components  of  the  signal  d(-),  6^'^  is  a  partial 
subset  of  such  channels  (analogous  to  ^^‘^),  denotes  the  response  of  the  subband-domain 

filter,  and  6^  ^  is  an  estimate  of 


rsj 


(a) 


Figure  8:  Linear  filtering  in;  (a)  input-domain  vs.  (b)  subband-domain 


Our  approach  is  motivated  by  the  consideration  that  ^a-(-),  the  estimated  subband  components 
of  the  desired  signal  d(-),  are  to  be  used  as  an  input  for  further  processing,  such  as  target 
detection  and/or  classification.  Moreover,  we  assume  that  such  further  processing  is  to  be  done 
in  the  subband  domain  as  well.  This  dictates  our  choice  to  produce  estimates  of  the  subband 
components  ^fc(-),  rather  than  directly  estimating  d(-)  itself. 

We  have  quantified  the  trade-off  between  quality  of  performance  and  implementation  cost  as¬ 
sociated  with  subband  estimation  schemes.  We  have  demonstrated  [1]  how  this  trade-off  can  be 
controlled  by  several  distinct  factors,  including: 


•  Resolution  level  -  total  number  of  available  subband  channels. 

•  Channel  overlap  -  degree  of  overlap  between  the  frequency  responses  of  subband  channels.  Is 
determined  by  the  choice  of  a  “mother  wavelet.” 

•  Utilization  ratio  -  the  fraction  of  available  subband  channels  that  are  involved  in  the  estimation 
scheme  (given  by  the  ratio  between  the  integer  i  in  (40)  and  the  integer  L,  the  total  number 
of  available  subband  channels). 

•  Inter-channel  estimation  sparsity  -  the  number  of  neighboring  channels  used  in  forming  the 
estimate  5,(-).  This  index  may  range  from  a  value  of  zero,  which  corresponds  to  the  diagonal 
estimation  scheme,  through  a  value  of  1  (tridiagonal  scheme),  to  the  full-complexity  variant, 
which  uses  all  subband  channels. 

•  Filter-order,  or  intra-channel  estimation  sparsity  -  the  number  of  samples  from  each  of  the 
subband  channels  used  in  estimation  of  5,(n)  for  every  time  instant  n. 
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Figure  9;  Wave-packet  configuration  of  filter  bank 


r-layer  filter  bank 


Subset  of 
t/L  channels 


Figure  10;  Structure  of  subband-domain  (suboptimal)  estimator. 


The  first  two  factors  are  determined  by  the  details  of  the  subband  decomposition  scheme  itself. 
For  instance,  the  level  of  resolution  in  wavelet-  and  wavepacket- based  filter-bank  configurations  is 
determined  by  the  depth  of  the  binary  tree  that  characterizes  the  filter-bank,  while  the  channel 
overlap  depends  also  on  the  frequency  characteristics  of  the  prototype  lowpass  filter:  lower  overlap 
can  be  achieved  by  higher  quality  filters,  but  at  the  expense  of  increased  computational  cost.  We 
use  a  filter  bank  with  a  binary-tree  configuration  (Fig.  9),  combined  with  a  sparse  subband-domain 
estimator  (Fig.  10). 

The  remaining  three  factors  depend  on  the  details  of  the  (optimal)  estimation  scheme  used. 
The  utilization  ratio  characterizes  the  subset  of  subband  channels  that  are  actually  involved  in  the 
estimation  process:  ignoring  (i.e.,  not  estimating)  some  of  the  subband  components  5i(-)  results  in  a 
reduction  in  computational  cost,  combined  with  some  degradation  in  overall  performance.  The  no¬ 
tion  of  estimation  sparsity  offers  a  further  refinement  of  the  same  cost- performance  trade-off:  relying 
on  the  relatively  low  overlap  between  non-adjacent  subband  channels  we  can  significantly  reduce 
the  cost  of  computing  each  individual  subband  estimate  5,(-),  which  suffering  only  minor  increase 
in  the  attendant  estimation  error.  Finally,  selecting  the  order  of  the  estimation  filter  (including  the 
possibility  of  channel-dependent  order)  is  get  another  way  to  control  the  cost-performance  trade-off. 
While  order  selection  is  the  only  degree  of  freedom  available  in  classical  optimal  filtering,  here  it 
combines  with  the  other  four  factors  to  offer  a  much  broader  range  of  design  scenarios. 

EST  d(1;N)  in  WVLT  DOMAIN;  SNR=5dB 


Figure  11:  Cost-performance  tradeoff:  utilization  ratio  and  inter-channel  sparsity 
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EST  d(1  :N)  in  WVLT  DOMAIN:  SNR=5dB 


Figure  12:  Cost-performance  tradeoff:  utilization  ratio  and  intra-channel  sparsity 

In  Sec.  3..1  we  introduce  the  notions  of  channel  overlap  and  estimation  sparsity.  We  describe  the 
structure  of  a  proposed  subband-domain  estimator  that  incorporates  the  five  design  factors  men¬ 
tioned  earlier.  In  Sec.  3. .2  we  describe  the  effect  of  these  factors  on  the  cost-performance  trade-off. 
We  show  that  utilization  ratio  and  estimation  sparsity  tend  to  be  the  dominating  factors  in  the 
sense  that  they  allow  the  most  significant  reduction  of  computational  cost  for  a  given  level  of  esti¬ 
mation  error,  and  that  the  most  effective  cost-performance  tradeoff  is  achieved  with  a  memoryless, 
diagonal  estimator  configuration,  i.e.,  with  the  highest  level  of  both  inter-channel  and  intra-channel 
sparsity  (see,  e.g..  Figs.  11,12).  Consequently,  we  have  focused  on  optimizing  the  configuration 
of  the  tree-structured  filter  bank.  This  means  we  have  compared  the  various  possible  subgraphs 
(actually  subtrees)  of  the  wave-packet  configuration  of  Fig.  9  in  terms  of  their  cost-performance 
tradeoff  and  constructed  procedures  for  optimal  selection  of  such  subtrees. 

Such  a  comparison  can  be  conveniently  illustrated  by  a  tradeoff  chart  showing  estimation  error 
(our  measure  of  performance  quality)  vs.  computational  cost,  as  in  Fig.  13.  In  this  chart  every 
point  (indicated  by  an  x)  represents  a  particular  subtree,  i.e.,  its  coordinates  are  the  estimation 
error  E  and  the  computational  cost  C  associated  with  this  subtree.  The  convex  boundary  curve 
at  the  bottom  of  the  chart  characterizes  the  collection  of  optimal  subtrees:  the  subtrees  on  this 
curve  are  optimal  in  the  sense  that  they  minimize  the  cost  function 

J  =  AC  +  (1  -  A)5  ,  0  <  A  <  1  (41) 

which  is  a  convex  combination  of  the  estimation  error  S  and  the  computational  cost  C.  A  particular 
subtree  on  the  boundary  curve  corresponds  to  a  particular  choice  of  the  weight  parameter  A:  as 
A  varies  from  0  to  1,  we  obtain  every  point  on  the  boundary  curve,  from  bottom  right  to  top  left. 
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relative  error 


In  Sec.  3. .3  we  present  an  algorithm  for  obtaining  the  optimal  filter-bank  configuration  for  every 
specific  value  of  the  weight  parameter  A.  The  collection  of  such  configurations  obtained  as  A  ranges 
from  0  to  1  constitutes  the  optimal  family,  i.e.,  the  subtrees  that  determine  the  boundary  curve 
discussed  above.  Since  the  optimal  family  contains  a  finite  number  of  subtrees,  the  boundary  curve 
is  piecewise-linear.  This  means  that  the  range  of  values  of  the  weight  parameter  A  is  divided 
into  subranges,  where  each  subrange  corresponds  to  a  single  subtree  from  the  optimal  family.  Since 
these  subranges  are  not  known  apriori,  the  construction  of  the  optimal  family  (cis  well  as  of  the 
associated  boundary  curve)  requires  a  search  over  the  entire  range  of  A. 

As  an  alternative,  we  present  in  Sec.  3.. 4  an  algorithm  for  (approximate)  determination  of  the 
optimal  family  without  explicit  use  of  the  weight  parameter  A.  This  means  that  a  (subopti- 
mal)  boundary  curve  is  determined  in  a  finite  number  of  steps,  and  without  the  need  to  carry  an 
exhaustive  search  in  A. 

3..1  Efficient  Scale- Recursive  Implementation 

Since  the  subband-domain  optimal  filter  has  multichannel  inputs  and  outputs,  scale-recursiveness 
is,  in  fact,  synonymous  with  channel-recursiveness.  Channel-recursive  optimal,  as  well  as  adaptive, 
schemes  (for  single-rate  channels)  have  been  constructed,  for  instance,  by  Ling  and  Proakis  [7], 
and  by  Lev-Ari  [8].  However,  such  schemes  are  not  directly  applicable  to  nonuniform  multirate 
processing,  i.e.,  to  the  case  when  the  decimation  ratios  Di  of  Fig.  7  are  not  all  the  same.  This 
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happens,  for  instance,  in  the  context  of  wavelet-based  estimation.  This  difficulty  can  be  approached 
in  a  variety  of  ways  including,  in  particular,  techniques  that  result  in  uniform  decimation; 

•  Using  a  so-called  wave-packet  configuration  [9,  10],  which  results  in  equal  decimation  ratios 
Di  in  all  channels  (Fig.  9).  In  contrast,  the  better  known  wavelet  configuration  results  in  a 
binary  dilation  relation  between  the  decimation  ratios,  i.e.,  Di  =  2Z),_i  for  all  i. 

•  Splitting  higher-rate  channels  into  multiple  lower-rate  channels  via  time-multiplexing,  as 
shown  in  Fig.  14  for  the  wavelet  configuration. 


Split  Channel 


Figure  14:  Wavelet  configuration  of  filter  bank,  with  channel  splitting 

The  advantages  and  disadvantages  of  each  approach  will  be  the  subject  of  further  research.  In  this 
report  we  consider  only  the  issue  of  statistical  coupling  between  channels:  using  the  wave-packet 
configuration  makes  it  possible  to  minimize  this  coupling  and,  consequently,  to  achieve  significant 
reduction  in  implementation  cost. 

In  general,  the  introduction  of  multichannel  subband-domain  processing  schemes  results  in  a 
mild  increase  of  the  computational  cost.  For  instance,  while  the  implementation  of  an  M-th  order 
input-domain  FIR  filter  requires  0{M)  computations  per  data  sample,  the  implementation  of  the 
equivalent  subband-domain  filter  requires  )  =  0[L-{M-\-2Mh^  computations  (where 

L  is  the  number  of  bands).  This  is  so  because  the  computational  cost  of  multichannel  estimation  is 
proportional  both  to  the  filter  order,  and  to  the  square  of  the  number  of  channels.  However,  since 
the  subband  rate  is  slower  (by  a  factor  of  L)  than  the  rate  of  the  input  data,  we  conclude  that 
the  computational  cost  of  optimal  (or  adaptive)  FIR  filtering  in  the  subband  domain  is  0{M  + 
2Mf{)  per  input  data  sample.  The  same  orders  of  magnitude  apply  also  to  the  implementation 
of  adaptive  multichannel  FIR  filters  [8].  Thus,  subband  domain  processing  is  comparable  to  input 
domain  processing,  both  in  terms  of  the  achievable  performance  (which  is  exactly  the  same  for  both 
techniques),  and  the  attendant  computational  cost  (which  is  approximately  the  same). 


35 


3. .1.1  Inter-Channel  Correlation  and  Overlap 

Scale-recursive  optimal  filtering  is  particularly  attractive  because  of  its  enhanced  ability  to  trade-off 
performance  for  computational  cost,  without  altering  the  filter  order.  As  explained  earlier,  such  a 
trade-off  can  be  achieved  by  using  a  partial  subset  of  the  subband  domain  channels,  resulting  in  a 
dramatic  reduction  in  computational  cost.  The  attendant  degradation  in  performance  can  be  kept 
relatively  small  by  designing  the  filter  bank  so  as  to  minimize  statistical  coupling  between  channels. 


/ 


Figure  15:  Filter  bank  with  negligible  overlap  between  non-adjacent  subband 


channels. 


Indeed,  by  choosing  the  filters  Hi{z)  in  such  a  way  that  and  have  negligible 

overlap  for  all  |i  —  i|  >2  (Fig.  15),  we  find  that 

for|i-i|>2  (42) 

for  all  1.  This  qualitative  statement  relies  on  the  observation  that  the  correlation  between 
subband  channels  can  be  expressed  in  terms  of  frequency  domain  integrals  that  involve  the  product 
which  is  assumed  to  be  negligible  for  |f  —  j\  >  2. 

.A  quantitative  statement  about  the  relation  between  channel  overlap  and  correlation  of  subband 
signals  must  involve  the  notion  of  channel  overlap  coefficients,  which  we  have  defined  as 


Pij  • — 


/o‘  df 


(43) 


where  :=  Thus,  we  say  that  the  overlap  between  non-adjacent  channels 

is  negligible  if  the  corresponding  overlap  coefficients  are  all  small,  viz.. 


We  have  shown  [1]  that 


pij  1  for  all  |f  —  i|  >2 


(44) 


£{f,w «;(/)} 


<  Pii  llff.il  llffyll  ff  l^n)!’ 


and 


^  Pij  llffill  ||tf,||  £Wn)P 


In  particular,  if  the  filter-bank  corresponds  to  a  wave-packet  configuration,  as  in  Fig.  9,  then  \\Hi\\  := 
|1//||  =  1  for  all  i,  so  that 


<  PiiE\x(n)\-‘ 


Thus,  reducing  the  inter-channel  overlap  indeed  results  in  suppression  of  the  correlation  between 
subband  channels. 
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3. .1.2  Diagonal  and  Tridiagonal  Configurations 

Focusing  on  Si,  the  estimate  of  a  single  subband  component  of  d(-),  we  observe  that  under 
the  assumption  (44),  namely  pij  <C  1  for  all  |i  —  j\  >  2,  the  linear  least  squares  estimate  of 
Si{m)  depends  mostly  on  the  three  signals  ^,(-),  and  while  the  contribution 

from  the  remaining  elements  of  ^(•)  is  negligible.  Constructing  an  estimate  of  6i{m)  that  is 
based  solely  on  these  three  components  of  ^(•)  results  in  an  order  of  magnitude  reduction  in 
computational  cost.  The  small  degradation  in  the  quality  of  the  estimate  depends  on  the  values  of 
pij  for  —  i|  >2  and  can  be  minimized  by  appropriate  design  of  the  filter  bank.  The  resulting 
subband-domain  filter  'T  is  called  tridiagonal,  because  the  estimate  5,-  depends  only  on  the  three 
subband  channels  and  (Fig.  16). 


Figure  16:  The  structure  of  a  tridiagonal  subband-domain  filter. 

A  further  simplification  of  the  filtering  configuration  involves  using  a  diagonal  subband-domain 
filter  'T,  as  shown  in  Fig.  17. 


^1-1 


e.- 


<5. 


6+1 


6+1 


Figure  17:  The  structure  of  a  diagonal  subband-domain  filter. 
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The  corresponding  computational  cost  of  implementing  a  diagonal  or  tridiagonal  subband- 
domain  filter  now  becomes  0{L  •  -|.  2Mh)  computations  per  subband-domain 

sample,  or  per  input  sample.  In  addition  to  this  order-of-magnitude  savings  in  com¬ 

putational  cost,  using  subband-domain  filtering  makes  it  possible  to  achieve  further  cost  reduction 
by  selecting  a  subset  of  the  available  L  subband-domain  channels.  This  may  come  at  the  expense 
of  some  additional  degradation  in  performance. 

3. .1.3  Structure  of  the  Subband-Domain  Estimator 

The  construction  of  a  (suboptimal)  subband-domain  estimation  scheme  involves  the  selection  of 
five  design  parameters,  corresponding  to  the  five  factors  that  affect  the  cost-performance  trade-off 
discussed  in  the  introduction.  These  five  parameters  are: 

•  The  number  of  subband  channels  L,  which  determines  the  resolution  level. 

•  The  transfer  function  of  the  filter-bank,  which  determines  the  degree  of  channel  overlap. 

•  The  utilization  ratio  rf. 

•  The  inter-channel  estimation  order  Mi,  which  determines  the  estimation  sparsity. 

•  The  intra-channel  estimation  order  M2,  which  is  the  number  of  samples  from  each  subband 
channel  used  in  estimation. 

The  inter-channel  and  intra-channel  orders  can  be,  in  general,  different  from  channel  to  channel. 
However,  here  we  consider  only  the  uniform  case,  where  all  channels  use  the  same  inter-  and  intra¬ 
channel  orders. 

In  particular,  we  focus  on  filter-banks  associated  with  the  notion  of  wavelets  and  wave-packets. 
Such  filter-banks  are  constructed  using  a  binary  tree  configuration  which  involves  a  single  undeter¬ 
mined  prototype  lowpass  filter  P{z)  [9].  Thus,  the  resolution  level  of  such  filter-banks  is  determined 
only  by  the  depth  of  the  binary  tree,  while  the  degree  of  channel  overlap  depends  only  on  the  re¬ 
sponse  of  the  prototype  filter  P{z). 

We  use  an  r-layer  wave-packet-based  filter-bank  (Fig.  10),  so  that  the  total  number  of  available 
subband  channels  is  L  =  2^.  We  demonstrate  the  effect  of  utilization  ratio  by  processing  only 
the  subset  of  all  available  channels  corresponding  to  1  <  f  where  r)  can  vary  in  the 

range  I/l  <  <  1.  We  consider  both  inter-channel  and  intra-channel  sparsity.  The  first  type  is 

represented  by  two  sparse  estimation  schemes: 

•  Diagonal  estimation,  in  which  the  estimate  of  ^i(-)  is  constructed  from  samples  of  only 

(Ml  =  1). 

•  Tridiagonal  estimation,  in  which  the  estimate  of  S{(-)  is  constructed  from  samples  of  4,(-)5 
^i_i(-)  and  ^,+i(-)  (Ml  =  3). 
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Together  with  non-sparse  estimation  (which  uses  all  available  channels)  this  gives  rise  to  three 
choices  of  inter-channel  order,  which  we  call  “full”,  “tridiagonal”,  and  “diagonal.”  As  for  the  intra¬ 
channel  order,  we  consider  only  two  cases:  (i)  full,  which  uses  all  Af/2''  samples  available  from  a 
single  channel,  and  (ii)  memoryless,  which  uses  a  single  sample  (M2  =  1). 

We  use  a  block-processing  scheme,  so  that  subband-domain  processing  is  carried  out  at  the 
block-rate  which  is  l/iV  of  the  input  signal  rate,  where  N  is  the  length  of  the  input  data 
block  (see  Fig.  10).  The  total  number  of  samples  per  subband  channel  (per  one  block  of  input 
data)  is  N/2^  where  r  is  the  number  of  layers  of  our  wavelet-based  filter  bank.  The  use  of  a 
block-processing  scheme  introduces  another  reduction  in  the  overall  implementation  cost,  since  the 
number  of  computations  per  one  sample  of  the  input  data  is  reduced  by  a  factor  of  N. 

3.. 2  Cost-Performance  Trade-off 

In  the  example  we  consider  here,  the  signal  x{n)  is  generated  by  adding  a  white  noise  io(n)  to 
desired  signal  d(n),  while  d(n)  is  the  output  of  a  lowpass  filter  excited  by  another  white  noise  which 
is  uncorrelated  to  w{n).  We  now  turn  to  evaluate  the  estimation  error  for  our  example  as  a 
function  of  the  various  design  parameters.  We  scale  the  estimation  error  by  the  (square  root  of) 
the  power  of  the  desired  signal  d(n),  so  that  the  resulting  relative  error  can  be  compared  across 
examples. 


EST  d{1:N)  in  WVLT  DOMAIN:  full  structure 


Figure  18:  Effect  of  utilization  ratio 


39 


3. .2.1  Effects  of  Utilization  Ratio  (?/) 


The  effect  of  the  utilization  ratio  rj  alone  is  shown  in  Fig.  18,  which  involves  three  different  levels 
of  the  signal-to-noise  ratio  (SNR)  in  the  received  signal  x{n).  The  error  reduces  with  increasing  rj, 
the  effect  being  more  pronounced  for  high  SNRs.  This  means  that  we  can  trade-off  performance 
(=  estimation  error)  for  cost  (~  utilization  ratio).  An  explicit  demonstration  of  this  trade-off  is 
provided  in  Fig.  11,  where  each  curve  corresponds  to  varying  the  value  of  t/. 


3. .2. 2  Effects  of  Estimation  Sparsity 

The  introduction  of  inter-channel  sparsity  into  our  estimation  schemes  results  in  very  minor  degra¬ 
dation  in  performance,  except  when  t]  is  close  to  unity  (Fig.  19).  However,  since  sparse  estimation 
recjuires  fewer  computations,  the  cost-performance  trade-off  curves  (Fig.  11)  show  a  dramatic  ad¬ 
vantage  for  sparse  schemes  and,  in  particular,  for  the  diagonal  configuration:  one  can  achieve  a 
reduction  of  up  to  an  order  of  magnitude  (i.e.,  a  factor  of  10)  in  computational  cost  for  a  given  level 
of  estimation  error. 


ESTcl(1:N)  in  WVLT  DOMAIN:  SNR=5dB 


Figure  19:  Effect  of  inter-channel  sparsity 

The  effect  of  intra-channel  sparsity  (i.e.,  filter  order)  is  similar  to  that  of  inter-channel  sparsity 
(Fig.  20).  Again,  using  less  samples  results  in  an  improved  cost-performance  trade-off  (Fig.  12), 
with  up  to  2  orders  of  magnitude  reduction  in  computational  cost. 
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SNR=5dB,  STRUCTURE=full 


Figure  20:  Effect  of  intra-channel  sparsity 
3. .2. 3  Effects  of  Resolution  Level 

Our  block  processing  scheme,  in  which  the  overall  number  of  samples  remains  fixed  across  all 
resolution  levels,  results  in  estimation  errors  that  are  essentially  invariant  with  respect  to  the  level 
of  resolution  (Fig.  21).  However,  the  introduction  of  cost  considerations  complicates  our  analysis: 
for  a  full  (i.e.,  non-sparse)  configuration,  the  cost  of  estimation  reduces  with  r  (the  number 
of  layers),  while  the  cost  of  subband  decomposition  increases  with  r.  On  the  other  hand,  for  a 
diagonal  (i.e.,  maximally-sparse)  configuration,  the  cost  of  estimation  remains  invariant  with  r,  but 
the  performance  should  slightly  degrade  with  increasing  r.  Both  cases  suggest  the  existence  of 
an  optimal  choice  for  r  in  each  particular  instance:  further  analysis  is  needed  to  determine  this 
optimal  choice. 


3. .2.4  Effects  of  Wavelet  Choice 

One  example  of  the  effects  of  filter-bank  selection  is  shown  in  Fig.  22,  where  the  use  of  two  different 
Daubechies  wavelets  is  compared.  While  our  example  shows  a  definite  preference  for  simpler  filter 
banks,  more  research  is  required  in  order  to  determine  the  best  choice  of  a  filter  bank  for  a  given 
estimation  problem. 
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SNR=5dB.  STRUCTURE:  full 


Figure  21:  Effect  of  resolution  level 


SNR=5dB.  STRUCTURE=full 


Figure  22:  Effect  of  wavelet  choice:  comparing  the  use  of  Daubechies  wavelets  of  order  3  and  5. 
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3.. 3  Optimal  Filter  Bank  Configuration 


Our  analysis  of  the  estimation  error  associated  with  a  given  sparse  estimation  scheme  [1]  hcis  resulted 
in  closed-form  expressions  which  we  could  apply  to  the  problem  of  optimal  subtree  selection  [11]. 
In  particular,  we  have  shown  that  the  estimation  error  associated  with  the  memoryless,  diagonal 
sparse  estimation  scheme  described  in  [1]  can  be  expressed  in  the  form 

S  —  So  A£^,'  ,  /\S{  >  0 

i 

where  the  summation  is  over  all  the  vertices  (“binary  splits”)  of  the  subtree,  and  where  ASi  depends 
only  upon  the  power  spectrum  of  the  signal  at  the  node  i.  Similarly,  we  have  shown  that  the 
computational  cost  associated  with  each  subtree  configuration  can  also  be  expressed  in  the  form 

c  =  Co-X:ac.  ,  A(:.>0 

where  again,  the  summation  is  over  all  the  vertices  of  the  subtree. 

relative  error 


Figure  23:  The  cost-performance  chart  and  the  optimal  boundary  curve 


The  pair  (Co,  So)  characterizes  the  empty  subtree,  corresponding  to  processing  in  the  input 
domain,  i.e.,  no  filter-bank  is  being  used.  This  is  the  rightmost  point  on  the  optimal  (boundary) 
curve,  corresponding  to  A  =  0.  As  the  subtree  grows  to  include  more  and  more  vertices,  the 
computational  cost  reduces  and  the  estimation  error  increases,  resulting  in  {C,S)  points  to  the 
left,  and  above,  the  point  (Co,  So)  (Fig.  23).  The  foregoing  discussion  implies  that  we  are  faced  with 
a  graph  optimization  problem,  where  with  each  vertex  we  associate  an  (ordered)  pair  of  positive 
incremental  costs,  namely  (AC,-,  ASi),  such  that  the  overall  cost  associated  with  a  given  subtree  is 


J{X)  =  XC  +  {1-X)S  =  [aCo-H(1-A)Co 


+  5]  [-AAC.  -b(l-A)A£:.- 

iti 


(45) 
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The  optimization  (for  a  given  A)  is  carried  out  with  respect  to  /,  the  set  of  vertices  that  determines 
the  subtree. 

Every  specific  subtree  of  the  tree-structured  filter-bank  of  Fig.  9  results  in  a  point  in  the  C- 
£  chart  (Fig.  23).  The  collection  of  all  possible  subtrees  determines  a  region  in  this  chart  whose 
piecewise-linear  lower  boundary  curve  is  up-convex.  The  optimal  subtree  for  a  given  A  is  that 
boundary  point  where  the  slope  of  the  boundary  curve  equals  — A/(l  —  A)  (Fig.  23).  This  is  so 
because  the  tangent  to  the  curve  at  this  point  is  characterized  by 

dJ  =  Adr-b(l-A)d£  =  0  — >  ^  = - ^ 

^  ^  dC  \-\ 

We  have  constructed  an  algorithm  that  determines  the  optimal  subtree  for  a  given  (fixed)  value  of 
the  weight  coefficient  A  [11],  as  described  below. 

Our  algorithm  is  based  on  the  observation  that  the  cost  function  J(A)  of  (45)  is  additive  with 
respect  to  the  vertices  of  the  subtree.  Also  we  restrict  our  search  to  subtrees  of  the  complete  binary 
tree  of  a  fixed  depth  (say  r),  so  as  to  limit  the  set  of  vertices  over  which  we  need  to  carry  out  our 
search.  Thus,  the  optimal  subtree  can  be  determined  by  pruning^  i.e.,  by  starting  with  a  complete 
binary  tree  (of  depth  r),  and  removing  subsets  of  vertices  for  which  AJ  <  0.  To  be  more  specific, 
the  only  vertex  subsets  considered  for  removal  are  of  the  form 

14  =  set  of  vertices  that  are  the  children  of  vertex  k  (46) 

Thus,  at  any  given  stage  of  our  pruning  procedure  we  look  for  a  vertex  k  for  which 

E  <  0 

The  search  begins  with  the  leaves  of  the  subtree  and  continues,  layer  by  layer,  to  vertices  further 
away  from  the  leaves,  until  one  vertex  with  ^  AJ,  <  0  is  found.  Having  found  a  vertex  like  that 
we  remove  (prune)  all  vertices  in  14  and  repeat  our  search  procedure.  The  procedure  terminates 
when  ^  0  for  every  subset  of  vertices  14  as  defined  in  (46).  The  resulting  subtree  is  a 

member  of  the  optimal  family  and  determines  a  point  on  the  boundary  curve  in  the  C-£  chart. 

3. .4  A  “Greedy  Type”  Algorithm 

In  order  to  avoid  the  need  to  search  over  the  entire  range  of  A  we  introduce  here  an  algorithm 
that  attempts  to  directly  determine  the  corner  points  of  the  (piecewise-linear)  optimal  boundary 
curve.  We  know  that  the  endpoints  of  this  curve  are:  (i)  the  empty  subtree,  which  is  optimal  for 
A  =  0,  and  yields  the  lowest  error  5;  and  (ii)  the  complete  binary  tree  of  infinite  depth  (i.e., 
r  =  oo),  which  is  optimal  for  A  =  1,  and  yields  the  lowest  cost  C .  Thus,  our  algorithm  starts 
with  the  empty  subtree  and  attempts  to  determine  the  remaining  corner  points  one-by-one,  in  the 
order  of  their  appearance  on  the  boundary  curve. 

To  achieve  this  goal  we  increment  our  subtree  at  each  step  by  a  single  vertex  in  such  a  way 
that  the  ratio  ^  associated  with  incorporating  vertex  i  into  the  subtree  is  as  small  as  possible. 
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Thus,  our  algorithm  is  locally-optimal,  i.e.,  it  is  a  “greedy  type”  algorithm.  It  produces  a  sequence 
of  subtrees  where  each  member  in  the  sequence  is  a  subset  of  its  immediate  successor,  and  differs 
from  it  by  a  single  vertex.  The  piecewise-linear  curve  obtained  by  connecting  the  C-£  points  that 
represent  this  sequence  of  subtrees  is  not  necessarily  convex  (or  optimal).  Thus,  an  improved  curve 
can  be  obtained  by  selecting  the  convex  hull  of  the  points  determined  by  our  greedy  type  algorithm 
(Fig.  24).  This  still  yields  a  suboptimal  sequence  of  subtrees. 


Figure  24:  Cost-performance  tradeoff:  the  suboptimal  curve.  Solid  line:  the  convex  hull;  dashed 
line  -  the  output  of  the  greedy-type  algorithm. 


References 

[1]  H.  Liu  and  H.  Lev-Ari,  “Optimal  Filtering  in  the  Subband  Domain,”  CDSP  Technical  Report 
No.  TR-CDSP-94-32,  Northeastern  University,  Dec.  1994.  See  also  Proceedings  of  the  IEEE 
International  Conference  on  Acoustics,  Speech  and  Signal  Processing,  pp.  2076-2079,  Detroit, 
MI,  May  1995. 

[2]  h.  Gilloire  and  M.  Vetterli,  “Adaptive  Filtering  in  Subbands  with  Critical  Sampling:  Analysis, 
Experiments,  and  Application  to  Acoustic  Echo  Cancellation,”  IEEE  Trans.  Sig.  Proc.,  Vol. 
40,  pp.  1862-1875,  Aug.  1992. 

[3]  R.R.  Coifman  and  M.V.  Wickerhauser  “Entropy-Based  Algorithms  for  Best  Basis  Selection,” 
IEEE  Trans.  Information  Theory,  Vol.  38,  pp.  713-718,  Mar.  1992 

[4]  A.H.  Tewfik,  D.  Sinha,  and  P.  Jorgensen,  “On  the  Optimal  Choice  of  a  Wavelet  for  Signal 
Representation,”  IEEE  Trans.  Information  Theory,  Vol.  38,  pp.  747-765,  Mar.  1992. 


45 


[5]  S.G.  Mallat  and  Z.  Zhang,  “Matching  Pursuits  with  Time-Frequency  Dictionaries,”  IEEE 
Trans.  Sig.  Proc.,  Vol.  41,  pp.  3397-3415,  Dec.  1993. 

[6]  C.W.  Therrien,  Discrete  Random  Signals  and  Statistical  Signal  Processing,  Prentice-Hall,  En¬ 
glewood  Cliffs,  NJ,  1992. 

[7]  F.  Ling  and  J.G.  Proakis,  “A  Generalized  Multichannel  Least  Squares  Lattice  Algorithm  Based 
on  Sequential  Processing  Stages,”  IEEE  Trans.  Acoust.  Speech  and  Signal  Processing,  Vol. 
ASSP-32,  pp.  381-390,  Apr.  1984. 

[8]  H.  Lev-Ari,  “Modular  Architectures  for  Adaptive  Multichannel  Lattice  Algorithms,”  IEEE 
Transactions  on  Acoustics,  Speech  and  Signal  Processing,  Vol.  ASSP-35,  pp.  543-552,  Apr. 
1987. 

[9]  P.P.  Vaidyanathan,  Multirate  Systems  and  Filter  Banks,  Prentice-Hall,  Englewood  Cliffs,  NJ, 
1993. 

[10]  M.V.  Wickerhauser,  “Lectures  on  Wavelet  Packet  Algorithms,”  Proceedings  of  INRIA  Work¬ 
shop  on  “Ondelettes  et  Paquets  d’Ondelettes” ,  pp.  31-99,  Roquencourt,  France,  June  1991. 

[11]  H.  Liu  and  H.  Lev-Ari,  “Optimized  Filter  Banks  for  Efficient  Subband-Domain  Estimation,” 
CDSP  Technical  Report  No.  TR-CDSP-96-38,  Northea.stern  University,  Feb.  1996. 


Graduate  Students 

Hong  Liu 


Publications 

H.  Liu  and  H.  Lev-Ari,  “Scale-Recursive  Optimal  Filtering,”  CDSP  Technical  Report  No. 
TR-CDSP-94-25,  Northeastern  University,  Dec.  1994. 

H.  Liu  and  H.  Lev-Ari,  “Optimal  Filtering  in  the  Subband-Domain,”  Proceedings  of  the  IEEE 
International  Conference  on  Acoustics,  Speech  and  Signal  Processing,  pp.  2076-2079,  Detroit, 
MI,  May  1995. 

H.  Liu  and  H.  Lev-Ari,  “Optimized  Filter  Banks  for  Efficient  Subband-Domain  Estimation,” 
CDSP  Technical  Report  No.  TR-CDSP-96-38,  Northeastern  University,  Feb.  1996. 


46 


4.  Parallel  Algorithms  and  Architectures  for  Discrete  Wavelet 
Transforms  —  Elias  S.  Manolakos 

In  this  section  we  provide  a  summary  of  the  main  results  obtained  by  Prof.  Elias  S.  Manolakos  and 
his  graduate  students  on  the  synthesis  of  parallel  computational  structures  for  the  Discrete  Wavelet 
Transform  (DWT).  Improving  algorithm  performance  through  parallelism  has  been  addressed  from 
two  points  of  view:  By  optimizing  aggregate  throughput  in  a  series  of  problem  instances  by  means 
of  modular  VLSI  architectures  having  a  small  number  of  processing  elements  (PEs);  and  by  means 
of  scalable  algorithms  for  parallel  supercomputers  having  a  large  number  of  PEs  and  a  fixed  inter¬ 
connection  network. 

Our  early  results  on  the  synthesis  of  VLSI  architectures  for  the  DWT  appeared  and  in  the 
Proceedings  of  the  SPIE  Conf.  on  Mathematical  Imaging:  Wavelet  Applications  in  Signal  and 
Image  Processing  [1]  and  in  VLSI  Signal  Processing  VI  [2].  More  detailed  journal  papers  also 
including  more  recent  work  both  on  the  synthesis  methodology  introduced  as  well  as  the  resulting 
DWT  array  solutions  are  currently  under  review  in  the  IEEE  Transactions  on  Signal  Processing 

[5,  6] 

Our  results  on  the  parallel  scalable  DWT  algorithms  for  mesh  and  hypercube  networks  have 
been  accepted  for  publication  to  the  special  issue  on  wavelet  based  signal  processing  of  the  Journal 
011  Multidimensional  Signal  Processing  [7]  and  will  appear  in  1996.  Preprints  of  all  recently  accepted 
publications  are  available  on  the  Web  coordinates:  http://www.cdsp.neu.edu/info/manolakos.html. 

To  remain  focused,  in  the  sequel  we  only  highlight  the  main  points  of  our  research  findings 
during  the  whole  three  year  period  of  the  grant.  The  technical  details  can  be  found  in  the  above 
mentioned  refereed  publications  that  are  available,  and  have  also  been  discussed  in  the  submitted 
annual  reports  (April  1994,  April  1995). 

4..1  Synthesis  of  VLSI  Architectures  for  Discrete  Wavelet  Transforms 

Our  main  objective  was  twofold:  to  understand  and  analyze  the  parallelism  inherent  in  the  DWT 
algorithm,  and  to  synthesize  parallel  processing  structures  for  its  computation  that  could  be  imple¬ 
mented  in  VLSI.  The  architectures  derived  have: 

•  Regular  structure  and  0{L)  Processing  Elements  (PEs),  where  L  is  the  number  of  wavelet 
filter  coefficients  (usually  in  the  range  of  4-12). 

•  Minimal  amount  of  memory  distributed  to  the  processors  and  no  global  memory  or  inter¬ 
processor  interconnection  networks  that  limit  architectural  scalability. 

•  Simple  control  distributed  to  the  processors. 

.A  new  methodology  for  synthesizing  distributed  memory  and  control  parallel  computational 
structures  has  been  developed.  It  is  suitable  for  signal  processing  and  numerical  algorithms  with 
partially  irregular  data  dependencies  structure,  such  as  the  DWT,  that  cannot  be  described  by  a 
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set  of  Uniform  or  Affine  Recurrence  Equations  (UREs,  AREs).  For  this  class  of  algorithms  it  is 
known  that  conventional  linear  space-time  mapping  methods  fail  to  provide  efficient  parallel  array 
solutions.  Our  proposed  methodology  is  based  on  constructing  and  applying  the  appropriate  non¬ 
linear  index  space  transformations  (re-indexing)  that  lead  to  a  reformulation  of  the  given  algorithm 
in  two  dimensions  (i.e.  using  only  two  indices)  so  that  after  applying  conventional  linear  space-time 
mapping  very  efficient  linear  processor  arrays  suitable  for  VLSI  implementation  are  derived.  So  it 
can  be  though  of  as  the  appropriate  “pre-processing”  required  to  bring  the  algorithm  in  a  more 
benign  form  at  which  point  the  arsenal  of  linear  mapping  theory  with  all  its  known  advantages  can 
be  called  to  complete  the  synthesis  successfully. 

Several  families  of  systolic  and  semi-systolic  arrays  for  computing  general  (any  number  of  oc¬ 
taves)  DWTs  have  been  synthesized  in  that  way.  To  the  best  of  our  knowledge  they  are  among 
the  most  efficient  (9(//)-processor  solutions  available  in  the  literature,  as  it  is  also  acknowledged  by 
others  (see  for  example  [3]  page  307).  By  using  our  formal  and  top-down  algorithm-to-architectures 
synthesis  approach  we  have  been  able  to: 

•  derive  behavioral  models  of  processor  arrays  that  can  be  easily  translated  to  VHDL  code 

•  prove  analytically  the  correctness  of  the  designs  at  the  behavioral  level  before  building  any 
hardware 

•  compare/contrast  deferent  candidate  designs  in  terms  of  latency  (parallel  running  time),  pro¬ 
cessors  utilization  and  throughput 

In  the  interest  of  space  we  will  not  re-iterate  here  on  results  already  included  in  previous  annual 
reports  or  more  recent  publications.  For  example,  in  the  April  1994  report  we  presented  three 
scalable  array  architectures  which  compute  the  DWT  algorithm  of  an  M-sample  sequence  and  up 
to  any  desired  octave  J.  They  have  been  named  SYST,  BCAST  and  PIPE  respectively  and  have 
been  show  to  achieve  latency  of  3^7/2 -[- 2(2)'^“^  -2,  M-f  2'^“^  -f-3,  and  Mf2  +  2'^~'^  —2  respectively 
(where  M  >  2"^  is  the  input  sequence  length,  L  is  the  number  wavelet  filter  coefficients).  Therefore 
BCAST  is  33%  faster  than  SYST,  and  PIPE  is  66%  faster  than  SYST.  Processor  utilization  for 
SYST,  BCAST  and  PIPE  is  66%,  100%  and  100%  respectively. 

The  systematic  dependence  analysis  and  localization  of  the  algorithm  and  a  3rd-octave  array 
solutions  synthesis  has  appeared  in  the  Proceedings  of  the  SPIE  Conf.  on  Mathematical  Imaging: 
Wavelet  Applications  in  Signal  and  Image  Processing  [1]  .  The  more  general  case  of  arrays  for  any 
desired  number  of  octaves  J  and  sequences  of  length  M  >2'^  have  been  published  in  VLSI  Signal 
Processing  VI  [2].  More  detailed  journal  papers  including  both  the  methodology  introduced  and 
the  resulting  array  solutions  have  been  submitted  and  are  currently  under  review  in  the  prestigious 
IEEE  Transactions  on  Signal  Processing  (5,  6] 

4.. 2  Scalable  Parallel  Algorithms  for  Wavelet  Transforrms  on  the 

mesh  and  hypercube  networks 

VVe  have  also  derived  several  parallel  algorithms  for  1-D  and  2-D  Discrete  Wavelet  Transforms  that 
exhibit  very  good  scalability  properties  on  different  general  purpose  processor  networks  such  as  the 


48 


2-D  mesh  and  the  hypercube.  For  example  our  data  parallel  SIMD  algorithms  can  perform  a  J  levels 
multiresolution  analysis  of  a  length  M  1-D  data  sequence  in  0{JL)  time  on  an  array  of  M  PE’s, 
where  L  is  the  number  of  wavelet  filter  coefRcients.  For  instance,  a  J  =  10  levels  pyramid  DWT 
decomposition  of  a  length  M  =  4096  sequence  can  be  completed  in  20.7msec  on  the  MasPar-1  2-D 
torus  array  [4]. 

We  have  primarily  focused  on  parallel  algorithms  for  the  2-D  DWT  for  performing  efficiently 
multiresolution  analysis  of  large  size  SAR  images.  We  considered  two  slightly  different  versions 
of  the  2-D  DWT,  known  in  the  literature  as  the  Standard  (S-,  or  Beylkin’s)  and  Non-standard 
(NS-,  or  Mallat’s)  forms,  that  are  both  extensively  used  in  different  applications.  We  mapped  both 
forms  onto  P  processors  under  two  data  partitioning  schemes  of  the  input  matrix  M  x  M,  namely 
checkerboard  (-CP)  and  stripped  (-SP)  partitioning.  We  are  introducing  a  precise  performance 
model  for  the  four  parallel  algorithmic  2-D  DWT  variants  in  order  to  investigate  their  scalability 
properties  on  the  widely  used  Mesh  and  Hypercube  P-processor  networks  with  store- and-forward 
as  well  as  cut-through  routing.  By  scalability  here  we  mean  the  ability  of  a  parallel  algorithm  for 
making  efficient  use  of  increasing  computational  resources  (processors)  on  a  given  network. 

We  have  derived  the  asymptotic  isoefficiency  function  of  the  four  algorithms,  that  provides  a 
lower  bound  on  the  rate  of  growth  of  the  input  problem  size  (in  0{NP))  as  a  function  of  the 
processors  P,  necessary  to  maintain  a  constant  level  of  efficiency.  It  has  been  found  that  the  two 
checkerboard  partitioned  algorithms  on  the  Hypercube  and  on  the  cut-through-routed  Mesh  are 
scalable  as  NP  =  D(PlogP)  (Non-standard  form,  NS-CP),  and  cis  A'P  =  Q,{P\og^  P)  (Standard 
form,  S-CP);  while  on  the  store-and-forward-routed  Mesh  they  are  scalable  as  =  fi(F^)  (NS- 
CP),  and  as  NP  =  Cl(P^-'i)  (S-CP),  where  is  the  number  of  elements  in  the  input  matrix,  and 
7  €  (0, 1)  is  a  parameter  relating  M  to  the  number  of  desired  octaves  J  in  the  wavelet  decomposition, 
as  .]  =  [7  log  M] . 

The  Standard  form  algorithm  with  stripped  partitioning  (S-SP)  is  scalable  on  the  Hypercube 
as  AP  =  D(P^),  and  it  is  unscalable  on  the  CT-routed  Mesh  (does  not  assume  an  isoefficiency 
function).  Although  asymptotically  the  stripped  partitioned  algorithm  S-SP  on  the  Hypercube 
would  appear  to  be  inferior  to  its  checkerboard  counterpart  S-CP,  a  detailed  analysis  based  on 
the  often  ignored  proportionality  constants  of  the  isoefficiency  function  indicates  that  S-SP  may 
actually  be  more  efficient  that  S-CP  over  a  realistic  range  of  machine  and  problem  sizes.  A  milder 
form  of  this  result  holds  on  the  Mesh,  where  S-SP  would,  asymptotically,  appear  to  be  altogether 
unscalable. 

Since  the  proportionality  constants  in  our  performance  model  are  expressed  in  terms  of  the 
computation  speed  (1/tc)  S’S  well  as  the  communication  parameters  {ts,th,tw  fhe  message  startup, 
message  header  and  per-word  transfer  times,  respectively)  and  these  parameters  can  be  accurately 
estimated  for  a  given  algorithm/network  pair,  we  can  determine  which  algorithm/parallel  machine 
combination  is  the  most  cost-effective  candidate  for  performing  2-D  DWT  processing  for  ATR/D 
purposes,  given  the  throughput  requirements. 

Our  results  have  been  accepted  for  publication  to  the  special  issue  on  Wavelet  based  signal 
processing  of  the  Journal  on  Multidimensional  Signal  Processing  [7]  and  will  appear  in  1996. 
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5.  Model-Based  Target  Recognition  by  Parameter  Esti¬ 
mation  of  Hiereirchical  Mixtures  -  Elias  S.  Manolakos 


Target  recognition  was  investigated  in  the  framework  of  parameter  estimation  of  hierarchical  mixture 
densities  by  Prof.  Elias  S.  Manolakos  and  graduate  students.  As  a  result,  simple  hierarchical 
classifiers  have  been  developed  for  target/object  recognition  that  have  good  detection  capabilities 
even  in  the  presence  of  substantial  target  occlusion. 

A  modified  version  of  the  Expectation-Maximization  (EM)  algorithm  was  formulated  and  used 
to  derive  recursive  updating  rules  for  the  a-posteriori  probabilities  of  a  specific  target  being  present 
in  the  image.  The  EM  was  extended  to  multiple  level  hierarchies  in  order  to  also  deal  with  target 
translation  and  scale  invariance.  Regularizing  constraints  were  introduced  to  allow  reaching  robust 
decisions  in  an  unsupervised  manner.  A  local  and  recurrent  view  for  the  clustering  algorithm  that 
can  be  easily  parallelized  was  established  after  making  certain  approximations.  The  approach  has 
the  flexibility  to  account  for  the  non-stationary  statistics  of  the  input  data  as  well  as  non-additive 
noise  models,  two  aspects  that  are  very  important  for  SAR  based  ATR. 

Our  simulation  results  suggest  that  the  scheme  works  well  even  in  the  case  of  partially  occluded 
targets  presented  at  scales  for  which  templates  (models)  are  not  available.  The  theoretical  justifica¬ 
tion  of  this  part  of  our  work  and  the  first  simulation  results  with  synthetic  images  have  appeared  in 
the  Proc.  of  the  Conference  on  Information  Systems  and  Sciences,  Princeton,  March  1996  [8]  and 
the  April  1996  annual  report.  More  recent  results  using  natural  images  with  rich  background  and 
occluded  objects  are  accepted  and  will  appear  at  the  Proceedings  of  IEEE  International  Conference 
on  Image  Processing,  October  1996  [9]. 

VVe  present  a  brief  summary  of  these  latest  results  here  as  well.  As  an  example  of  a  real 
world  situation  and  to  test  our  scheme  we  considered  the  non-trivial  recognition  of  traffic  signs  in 
complicated  background.  Since  the  background  has  no  template,  cis  a  first  approach,  we  tried  to 
model  it  crudely  as  an  average  intensity.  This  can  be  very  unreliable  when  the  background  has 
large  variations.  Sometimes  we  use  two  levels:  a  light  and  a  dark  level  to  model  the  background. 
In  any  case  we  are  faced  with  a  situation  where  our  model  for  the  objects  in  the  image  is  highly 
reliable  whereas  the  model  for  the  background  in  very  unreliable.  We  account  for  this  discrepancy 
in  the  level  of  our  confidence  in  different  models  by  introducing  a  “temperature”  term  in  the  error 
database  used  as  prior  knowledge.  The  function  of  the  temperature  in  our  case  is  similar  to  its 
function  in  Monte  Carlo  simulations  except  that  here  we  choose  different  temperature  values  for  the 
objects/ targets  and  the  background(s)  reflecting  our  different  beliefs  in  their  models.  In  particular 
we  use  a  large  temperature  for  the  background  relative  to  the  objects. 

Some  indicative  results  obtained  from  simulations  on  real  images  are  shown  in  Fig.  25.  Also  Fig. 
26  presents  the  results  with  a  semi- synthetic  image  with  a  toy  pony  and  a  tool  that  occlude  each 
other.  It  illustrates  the  ability  of  the  scheme  to  identify  partially  occluded  objects  in  high  noise. 
In  all  these  images,  the  regions  with  high  intensity  (white)  refer  to  regions  of  high  a  posteriori 
probability  of  the  particular  object  being  present. 

Our  experiments  with  real  and  semi-synthetic  images  suggest  that  the  scheme  can  be  used  to 
recognize  objects  in  the  presence  of  occlusion  and  complicated  background.  Furthermore  the  scheme 
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(c)  (d) 


Figure  25;  (a)  Original  Image,  Posterior  pdfs  for  (b)  Traffic  Sign,  (c)  Dark  Background  and  (d) 
Light  Background. 

can  handle  different  noise  models.  It  is  clear  that  a  better  model  of  the  background  will  contribute 
towards  improving  the  quality  of  recognition.  The  ideal  values  for  free  parameters  such  as  the 
number  of  iterations  of  the  EM  loop,  the  size  of  the  neighborhood  and  the  value  of  the  temperature 
of  the  background  in  relation  to  that  of  the  objects  must  be  determined  through  cross-validation. 

VVe  are  currently  working  on  the  problem  of  reducing  the  complexity  particularly  in  memory 
by  implementing  a  hypothesis  pruning  technique  based  on  the  wavelet  decomposition.  We  are  also 
considering  better  modeling  of  the  background  with  the  available  data.  We  are  planning  to  use 
this  approach  for  the  the  automatic  recognition  of  ground  targets  from  SAR  images  if  we  could  get 
Xpatch  generated  target  signatures.  In  this  way  we  will  be  able  to  integrate  wavelet  based  signal 
processing  with  simple  hierarchical  classifiers  “attached”  at  the  leaves  of  the  wavelet  tree  for  the 
purpose  of  flexible  and  efficient  ATR. 
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6.  M.M.  Z^okar  —  Mo  del- Theory  Based  Fusion  Framework 
with  Application  to  Wavelet-Features  Based  Mutisen- 
sor  Target  Recognition 

In  this  report,  we  present  a  model-theory  based  fusion  methodology  for  multisensor  wavelet-features 
based  recognition  called  Automatic  Multisensor  Feature-based  Recognition  System  (AMFRS).  The 
goal  of  this  system  is  to  increase  accuracy  of  the  commonly  used  wavelet-based  recognition  tech¬ 
niques  by  incorporating  symbolic  knowledge  (symbolic  features)  about  the  domain  and  by  utilizing 
a  model-theory  based  fusion  framework  for  multisensor  feature  selection. 

Fusion  is  a  process  of  combining  (fusing)  information  from  different  sensors  when  there  is  no 
physical  (fusing)  law  indicating  the  correct  way  to  combine  this  information  [7].  A  fusion  problem 
can  be  defined  in  terms  of  finding  such  a  fusing  law.  Current  solutions  to  the  fusion  problem 
consist  of  numerous  clever,  problem-specific  algorithms  [5,  11).  It  is  not  clear  what  unifying  theory 
exists  behind  all  the  fusion  algorithms  which  would  guide  algorithm  development.  The  lack  of  an 
unifying  theory  results  in  a  difficulty  in  comparing  various  fusion  solutions  and  often  results  in 
suboptimal  solutions  which  negatively  affect  the  accuracy  of  the  multisensor  recognition  systems. 
Additionally,  fusion  often  increases  processing  time.  A  computer  system  can  be  easily  overloaded 
with  data  from  different  sensors  to  the  degree  that  it  cannot  perform  its  task  within  the  time 
constraints  dictated  by  the  application.  Therefore,  it  has  become  important  to  develop  an  unifying 
approach  for  fusing/combining  data  from  multiple  sensors  or  from  multiple  measurements  with 
minimal  computational  complexity. 

One  of  the  recent  research  efforts  to  develop  a  unifying  fusion  approach  is  described  in  [7],  where 
a  model-theory  based  Sensory  Data  Fusion  System  (SDFS)  is  proposed.  The  SDFS  uses  formal 
languages  to  describe  the  world  and  the  sensing  process.  Models  represent  sensor  data,  operations 
on  data,  and  relations  among  the  data.  Theories  represent  symbolic  knowledge  about  the  world 
and  about  the  sensors.  Fusion  is  treated  as  a  goal-driven  operation  of  combining  languages,  models, 
and  theories,  related  to  different  sensors  into  one  fused  language,  one  fused  model  of  the  world  and 
one  fused  theory.  Kokar  et  al.  [7]  formalizes  the  main  concept  of  the  sensor  data  fusion  theory  in 
terms  of  models  and  theories.  However,  no  algorithm  for  sensor  data  fusion  is  presented  in  [7],  and 
the  proposed  sensor  data  fusion  approach  is  not  tested  on  real  tasks  and  real  noisy  data. 

In  this  work,  we  apply  and  extend  the  model-theory  based  SDFS  to  a  multisensor  wavelet- 
features  based  target  recognition  by  using  the  AMFRS.  The  Discrete  Wavelet  Packet  Decomposition 
(DWPD)  is  used  for  feature  extraction,  and  the  Best  Discrimination  Basis  Algorithm  (BDBA) 
and  a  domain  theory  for  feature  selection  for  each  sensor.  The  BDBA  [10]  is  based  on  a  best 
basis  selection  technique  proposed  by  Coifman  and  Wickerhauser  in  [3].  The  BDBA  selects  the 
most  discriminant  basis.  For  the  purpose  of  this  work,  the  elements  (wavelet  coefficients)  of  the 
most  discriminant  basis  are  considered  as  features.  Then,  the  domain  theories  for  each  sensor 
select  interpretable  wavelet  features  from  the  most  discriminant  basis.  The  wavelet  features  are 
interpretable  in  terms  of  symbolic  target  features.  And  finally,  the  wavelet-features  are  combined 
(fused)  into  one  fused  wavelet-features  vector  by  utilizing  the  fused  theory.  The  fused  theory  is 
derived  from  a  symbolic  knowledge  about  the  domain  and  a  relation  between  the  symbolic  knowledge 
and  the  most  discriminant  basis  determined  by  using  the  BDBA.  We  show  that  the  knowledge  of 
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Figure  27:  Model-Theory  Based  Fusion  Framework 


a  fused  theory  allows  us  to  select  the  combination  of  features  (out  of  many  possible  combinations) 
which  results  in  the  increased  recognition  accuracy  of  the  AMFRS.  In  particular,  we  show  that 
the  recognition  accuracy  of  the  proposed  Automatic  Multisensor  Feature  based  Recognition  System 
(AMFRS)  is  better  than  the  recognition  accuracy  of  a  system  that  performs  recognition  using 
Most  Discriminant  Wavelet  Coefficients  (MDWC)  as  features.  The  AMFRS  utilizes  a  model-theory 
framework  (SDFS)  for  feature  selection  as  described  above,  while  MDWC  are  selected  from  the 
range  and  intensity  most  discriminant  bases  using  a  relative  entropy  measure  [12]. 


6..1  Model-Theory  Based  Feature  Fusion 


In  this  work,  we  consider  only  two  sensors  for  recognition.  However,  all  results  can  be  easily 
extended  to  more  sensors  without  a  loss  of  generality.  Assume  that  ni  features  are  selected  from 
the  measurement  data  of  Sensor  1  and  n2  features  are  selected  from  the  measurement  data  of 
Sensor  2.  The  goal  is  to  combine  (fuse)  these  features  into  one  feature  vector  consisting  of  exactly 
n  features.  Equivalently,  we  want  to  select  n  out  of  ni  -f  n2  features  which  gives  us 


C(n.+n„n)=  f 


(ni  +  n2)! 

(ni  -f  n2  -  n)!n! 


(47) 


possible  combinations.  This  can  be  a  big  number.  For  example,  for  n  =  ui  =  n2  =  10  we  have 
184,756  possible  combinations.  Therefore,  we  need  a  tool  to  select,  from  all  these  combinations,  the 
combination  of  features  which  results  in  most  accurate  recognition  decisions. 

To  select  such  a  combination  of  features,  we  apply  a  model-theory  based  fusion  framework  shown 
in  Fig.  27.  In  particular,  the  knowledge  of  a  fused  theory  allows  us  to  select  the  right  combination 
of  features.  The  fused  theory  is  derived  by  using  a  theory  fusion  operator  to  fuse  the  theory  Ti  for 
Sensor  I  and  the  theory  T2  for  Sensor2.  There  is  no  physical  law  known  and  therefore  there  is  no 
theory  fusion  operator  known  for  fusing  data  from  these  two  sensors.  Instead,  we  derive  a  theory 
fusion  operator  manually  from  a  symbolic  knowledge  about  the  domain  and  a  relation  between  the 
symbolic  knowledge  and  features  determined  by  using  the  BDBA.  By  applying  the  theory  fusion 
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operator  to  the  theory  Ti  and  the  theory  r2,  we  derive  the  fused  theory  T/.  In  the  same  manner, 
we  derive  manually  a  model  fusion  operator.  By  applying  the  model  fusion  operator  to  the  model 
Ml  and  the  model  M2,  we  derive  the  fused  model  M/. 

Fig.  28  shows  a  block-diagram  of  steps  involved  in  the  design  framework  for  Automatic  Multi¬ 
sensor  Feature-based  Recognition  System  (AMFRS).  The  meaning  of  each  block  is  the  following: 

1.  Reference  Database  1  and  Reference  Database  Save  databases  of  target/sound  signatures. 

2.  DWPD  &  BDBA  selects  the  most  discriminant  basis  for  both  reference  databases  by  using 
Discrete  Wavelet  Packet  Decomposition  (DWPD)  and  Best  Discrimination  Basis  Algorithm 
(BDBA). 

3.  Most  Discriminant  Basis  (MDB  1  and  MDB  2)  are  bases  with  maximum  discriminant  power 
with  regard  to  each  of  the  two  reference  databases  as  determined  by  a  relative  entropy  dis¬ 
criminant  measure. 
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4.  Theory  T\  and  Theory  T2  contain  the  domain  knowledge  about  features  in  the  MDB’s  which 
are  interpretable  in  terms  of  symbolic  target/sound  features,  about  target/sound  classes,  and 
about  relations  between  these  features  and  target/sound  classes. 

5.  Model  Construction  creates  a  model  for  a  given  domain  theory. 

6.  Model  Ml  and  Model  M2  represent  features  selected  from  both  sensor  data,  operations  on 
these  features,  and  relations  among  these  features. 

7.  Theory  Fusion  Operation  fuses  domain  theories:  the  theory  Ti  and  the  theory  T2. 

8.  Model  Fusion  Operation  fuses  the  model  Mi  and  the  model  M2  into  one  fused  model  Mj. 

9.  Fused  Model  Mj  represents  the  fused  features. 

10.  System  Implementation  implements  Automatic  Multisensor  Feature-based  Recognition  Sys¬ 
tem  (AMFRS)  using  the  knowledge  of  the  fused  model  Mj  and  the  reference  databcises  for 
training. 

11.  AMFRS  is  the  Automatic  Multisensor  Feature-based  Recognition  System  using  the  proposed 
model-theory  based  wavelet-features  selection  (fusion)  methodology. 

The  diagram  in  Fig.  28  shows  that  the  designer  of  the  AMFRS  is  given  two  reference  databases  of 
target/sound  signatures  corresponding  to  two  sensors.  The  Discrete  Wavelet  Packet  Decomposition 
(DWPD)  is  used  to  transform  these  signatures  into  a  time/frequency  (wavelet)  domain  and  the 
Best  Discrimination  Basis  Algorithm  (BDBA)  is  used  to  select  the  most  discriminant  basis  for  each 
reference  database.  The  designer  also  knows  domain  theories  that  contain  the  knowledge  about 
interpretable  features,  target  classes  and  relations  between  these  features  and  classes.  Knowing 
the  most  discriminant  bases  and  the  domain  theories  for  each  sensor,  he  constructs  the  models 
for  these  domain  theories.  The  knowledge  of  the  fused  theory  allows  the  designer  to  select  the 
right  combination  of  features.  The  fused  theory  7/  is  derived  by  using  a  theory  fusion  operator  to 
fuse  the  domain  theories  Ti  and  T2.  If  there  is  no  physical  law  known  and  therefore  there  is  no 
theory  fusion  operator  known  for  fusing  data  from  these  two  sensors,  the  designer  derives  a  theory 
fusion  operator  manually  from  a  symbolic  knowledge  about  the  domain  and  a  relation  between  the 
symbolic  knowledge  and  interpretable  features.  Using  the  fused  theory,  he  derives  manually  a  model 
fusion  operator.  By  applying  the  model  fusion  operator  to  the  model  Mi  and  the  model  M2,  the 
designer  derives  the  fused  model  Mj. 

In  the  presented  AMFRS  methodology  for  feature  selection  (fusion),  fusion  operators  are  derived 
by  using  reduction,  expansion  and  union  operators  [2].  These  reduction,  expansion  and  union 
operators  are  defined  as  follows: 

•  Reduction  Operator 

A  language  T’’  is  a  reduction  of  the  language  L  if  the  language  L  can  be  written  as 

L  =  U[JX,  (48) 

where  X  is  the  set  of  symbols  not  included  in  L’’. 
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We  form  a  model  M*"  for  the  language  U  as  a  reduction  of  the  model  M  =<  A,I  >  for 
the  language  L  by  restricting  the  interpretation  function  I  —  T  U  Ix  on  L  =  L""  U  X  to  F. 
Therefore 

M''=<A,r>  (49) 

We  form  a  theory  (subtheory)  T’'  for  the  language  L’’  as  a  reduction  of  the  theory  T  for  the 
language  L  by  removing  sentences  from  the  theory  T  which  are  not  legal  sentences  of  the 
language  L’’. 

•  Expansion  Operator 

A  language  T®  is  an  expansion  of  the  language  L  if  the  language  L  can  be  written  as 

I®  =  L  U  X,  (50) 

where  X  is  the  set  of  symbols  not  included  in  L. 

We  form  a  model  M®  for  the  language  L®  =  L\JX  as  an  expansion  of  the  model  M  =<  A,I> 
for  the  language  L  by  giving  appropriate  interpretation  F  to  symbols  in  X.  Therefore 

=<  A,IU  Ix>  (51) 

We  form  a  theory  T®  for  the  language  L®  as  an  expansion  of  the  theory  T  for  the  language  L 
by  adding  a  set  of  new  legal  sentences  of  the  language  T®  to  the  theory  T. 

•  Union  Operator 

A  language  Z-  is  a  union  of  the  languages  Li  and  L2  if  the  language  L  can  be  written  as 

L  =  Li\J  L2.  (52) 

A  model  M  =<  A;  i?,  G,x',I  >  for  the  language  L  is  a  union  of  the  model 

Ml  =<  Ai;ili,Gi,Xi;/i  >  for  the  language  Li  and  the  model  M2  =<  A2;  i?2,  G2,  X2; /2  > 

for  the  language  L2  if  the  model  M  can  be  written  as 

M  =  Ml  U  M2  =<  Ai  U  A2;  Ri  U  R21  Gi  U  G21  ici  U  X2',  It  U I2  >,  (h3) 

where  R,  Ri,  R2  are  relations;  G,  Gi,  G2  are  functions;  x,  xi,  X2  are  constants;  and  7,  It,  I2 
are  interpretation  functions. 

We  form  a  theory  T  for  the  language  L  by  applying  a  union  operator  to  the  theory  Ti  for  the 
language  Lt  and  to  the  theory  T2  for  the  language  L2.  Therefore 

T  =  TiUT2  (54) 


6. .2  Automatic  Multisensor  Feature-based  Recognition  System 

Fig.  29  shows  a  conceptual  view  of  the  Automatic  Multisensor  Feature-based  Recognition  System 
(AMFRS)  for  two  sensors. 

The  meaning  of  each  block  in  the  AMFRS  is  the  following: 
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Figure  29:  Automatic  Multisensor  Feature-based  Recognition  System  (AMFRS) 

1.  The  World  is  a  domain  of  the  recognition  problem  with  Sensor  1  and  SensorS  providing  infor¬ 
mation  about  the  domain  to  the  AMFRS. 

2.  Feature  Extraction  extracts  features  from  measurement  data  using  the  BDBA  and  a  model 
theory  framework. 

3.  Model  (Ml)  Checking  tests  whether  the  measurement  data  represent  a  model  Mi  of  a  given 
theory  Ti  of  the  domain. 

4.  Model  (M2)  Checking  tests  whether  the  measurement  data  represent  a  model  M2  of  a  given 
theory  T2  of  the  domain. 

5.  Model  Fusion  -  the  fusion  operators  for  the  models  Mi  and  M2. 

6.  Model  (Mj)  Checking  tests  whether  the  fused  data  represent  a  model  Mj  of  a  given  theory 
T/  of  the  domain. 

7.  Classification  uses  the  theory  Tj  to  arrive  with  a  final  recognition  decision. 

There  are  three  types  of  information  available  to  the  AMFRS: 

•  the  target  signatures  S2  6  5,  i  =  1,  •  •  • ,  A',  where  5  is  a  signal  space; 

•  classes  Cj  E  C,  j  =  1,2,  of  targets  to  be  recognized,  where  C  is  a  target  class  space; 

•  symbolic  features  bj  €  B  for  each  class  Cj  €  C  of  targets,  where  .B  is  a  set  of  symbolic  features. 
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The  goal  of  the  AMFRS  is  to  increase  the  recognition  accuracy  by  using  all  three  types  of 
information  in  an  interpretable  way.  The  recognition  accuracy  is  defined  as 

number  of  correct  recognition  decisions  ,  . 

recognition  accuracy(7o)  = - ; - ; - - r-^ - - — r-: - *  100  (55) 

total  number  of  recognition  decisions 

where  the  total  number  of  recognition  decisions  is  equal  to  the  number  of  target  signatures  in  a  test 
set.  Similarly,  we  define  mis  classification  rate  as 


misclassi fication  rate{%)  = 


number  of  incorrect  recognition  decisions 
total  number  of  recognition  decisions 


*100  =  XQD— recognition  accurO' 
(56) 


In  this  work,  we  investigate  a  method  for  extracting  features  /2  €  F  of  the  signal  S2  €.  S 
represented  in  the  wavelet  domain,  such  that  these  features  are  interpretable  in  terms  of  the  symbolic 
features  bj  6  B  for  every  target  class  cj  6  C.  We  define  a  recognition  problem  as  a  mapping  S  C. 
This  mapping  is  composed  of  three  mappings:  (i)  S  — +  W,  (ii)  W  — >  F,  and  (iii)  F  —*  C.  The 
mapping  F  — +  VF  is  determined  by  the  Discrete  Wavelet  Packet  Decomposition  (DWPD).  The 
DWPD  transforms  a  sensor  signal  S2  G  S'  into  a  time/frequency  (wavelet)  representation  W2  G  W. 
The  mapping  IF  — >  F  is  determined  by  the  Best  Discrimination  Basis  Algorithm  (BDBA)  and 
a  model-theory  framework.  The  BDBA  selects  the  most  discriminant  basis.  The  model  checker 
selects  features  from  the  most  discriminant  basis.  Conceptually,  a  mapping  F  — >  C  is  determined 
by  a  domain  theory.  The  domain  theory  is  an  optimal  classifier  for  deterministic  target  signatures. 
To  apply  the  domain  theory  to  noisy  target  signatures,  we  implement  it  by  using  a  backpropagation 
neural  network. 


6. .3  Target  Recognition  Scenario 

As  the  example  of  a  target  recognition  problem,  we  consider  a  version  of  a  waveform  recognition 
problem,  originally  examined  and  applied  to  a  ship  recognition  project  in  [1]  and  then  analyzed  in 
[12].  The  goal  of  this  problem  is  to  recognize  the  class  to  which  a  given  target  signature  belongs, 
knowing  a  reference  database  of  target  signatures.  The  reference  database  is  synthesized  using 
different  combinations  of  three  basic  waveforms  to  form  three  classes  of  target  signatures. 

Our  formulation  of  the  recognition  problem,  which  is  described  below,  differs  from  the  original 
waveform  recognition  problem  in  the  following: 

•  the  number  of  the  measurements  in  the  target  signatures  is  increased  from  21  in  [1]  and  32  in 
[12]  to  128; 

•  one  additional  class  of  target  signatures  is  added,  so  that  we  have  a  four-class  problem  instead 
of  a  three-class  problem; 

•  symbolic  features  characteristic  for  each  class  of  targets  are  assumed  to  be  known; 

•  the  recognition  problem  is  extended  to  a  multisensor  scenario  by  using  two  reference  databases 
of  target  signatures  (one  for  each  sensor). 
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Figure  30:  Five  Target  Signatures  for  Each  of  the  Four  Clcisses  of  Targets  in  DTi 
The  DTi  reference  database  of  target  signatures  (corresponding  to  Sensor i)  was  synthesized 


using  the  waveforms 

hi{i)  =  max((24  —  [i  —  25|)/2, 0),  (57) 

^2(0  =  hi{i  -  32),  (58) 

^3(0  =  hi{i  -  16),  (59) 

to  generate  the  following  four  classes  of  target  signatures: 

classl  =  Ci(i)  =  uhi{i)  +  (1  —  u)h2{i)  +  e(f),  (60) 

class2  =  C2{i)  =  uhi{i)  +  (1  —  u)h3{i)  +  e(z),  (61) 

classZ  —  03(2)  =  uh2{i)  +  (1  -  u)h3{i)  +  e{i),  (62) 

classA  =  04(1)  =  (1  —  u)hi{i)  +  e(i),  (63) 


where  i  =  !,•••,  128,  u  is  a  uniform  random  variable  and  €(f)  are  normally  distributed  random 
variables.  Fig.  30  shows  five  examples  of  target  signatures  for  each  of  the  four  classes  of  targets  in 
DTi. 

The  DT2  reference  database  of  target  signatures  (corresponding  to  Sensor2)  was  synthesized 
using  the  following  four  classes  of  target  signatures: 


\  e(i) 

for  i 

=  1,-- 

.,10 

classl  =  cl{i)  =  < 

2uci(i  +  49) 

for  i 

=  11,. 

..,69 

(64) 

[  2(1  -u)ci(e-69) 

for  i 

=  70,. 

••,128, 

r  e{i) 

for  i 

=  !,•• 

•,io 

class2  —  cl{i)  =  < 

2wc2(?  +  49) 

for  i 

=  11,- 

..,69 

(65) 

[  2(1  -  u)c2{i  -  69) 

for  i 

=  70,. 

••,128, 
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Figure  31:  Five  Target  Signatures  for  Each  of  the  Four  Classes  of  Targets  in  DT2 


<i) 

for  f  =  1,  •  • 

•,10 

classS  =  cl{i)  =  < 

2uc3{i  -h  49) 

for  i  =  11,  • 

••,69 

(66) 

^  2(1  -  u)c3{i  -  69) 

for  i  —  70,  • 

••,128, 

1 

f  ^(i) 

for  t  =  1,  •  • 

•,10 

classi  =  =  < 

2uc4{i  +  49) 

for  i  =  11,  • 

••,69 

(67) 

[  2(1  —  u)c4(f  —  69) 

for  i  —  70,  • 

••,128, 

where  i  =  !,•••,  128,  u  is  a  uniform  random  variable  and  e{i)  are  normally  distributed  random 
variables.  Fig.  31  shows  five  examples  of  target  signatures  in  DT2  for  each  of  the  four  classes  of 
targets. 

6. .4  Feature  Selection  for  Target  Recognition 

Each  of  the  target  signatures  in  the  DT\  reference  database  is  transformed  into  the  wavelet  domain 
using  the  DWPD  with  the  Daubechies.G  compactly  supported  wavelets  with  extreme  phase  and 
highest  number  of  vanishing  moments  compatible  with  their  support  width.  A  complete  orthonormal 
Most  Discriminant  Basis  (MDB)  is  selected  using  the  BDBA.  The  total  number  of  components 
(wavelet  coefficients)  in  this  basis  is  equal  to  128,  i.e.,  it  is  equal  to  the  number  of  samples  in  the 
target  signature.  Fig.  32  shows  the  MDB  for  the  DTi  reference  database  and  the  discriminant 
value  of  each  component  as  determined  by  a  relative  entropy  measure.  The  MDB  consists  of  the 
two  subbases  on  the  third  level  of  the  DWPD  decomposition  and  one  subbasis  on  the  second  level 
of  the  decomposition.  In  order  to  limit  the  computational  complexity  of  recognition,  we  apply  a 
domain  theory  (symbolic  knowledge  about  the  target  signatures)  for  the  DTi  reference  database 
to  select  Hi  <  128  wavelet  coefficients  (features)  from  the  complete  MDB.  Our  aim  is  to  select 
ni  features  that,  in  addition  to  significant  discriminant  power,  are  interpretable  in  terms  of  the 
following  symbolic  target  signature  features: 
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Figure  32:  Most  Discriminant  Basis  (MDB)  for  the  DT\  Reference  Database 


•  rising  edges,  where  the  value  of  the  signal  is  increasing  within  a  given  time  interval; 

•  falling  edges,  where  the  value  of  the  signal  is  decreasing  within  a  given  time  interval; 

•  zero  edges,  where  the  value  of  the  signal  remains  around  zero  within  a  given  time  interval; 

•  peaks,  where  the  value  of  the  signal  has  its  local  maximum. 

The  value  of  selected  features  indicates  which  one  of  the  four  symbolic  features  described  above  is 
present  in  the  target  signature,  while  the  position  of  selected  features  in  the  MDB  indicates  the 
time  interval  within  which  a  given  symbolic  feature  is  present  in  the  target  signature. 

Below,  we  describe  an  appropriate  language,  a  model,  and  a  theory  for  Sensori.  The  symbol  < 
is  a  logical  symbol  with  the  usual  interpretation  as  a  linear  order  relation  on  the  universe. 


1.  Language  L\ 

A  feature-level  language  Li  for  Sensori  is  the  following: 

Li  =  {c/assli,c/ass2i,c/ass3i,c/ass4i,/i,0,  l,Cro,Cr,  ,...,C'r4,C7rs},  (68) 

where 

•  c/assli,  c/ass2i,  c/ass3i,  c/ass4i  are  6-placed  relation  symbols  (classes  of  targets  to  be 
recognized  using  Sensori), 

•  fi  is  an  1-placed  function  symbol  (a  function  that  maps  features’  indices  into  features’ 
values), 

•  0, 1  are  constant  symbols, 

•  Cro,  Cri, . . . ,  Cri,Cr^  are  constant  symbols  (features’  indices). 
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2.  Theory  T\ 

A  feature-level  theory  Ti  for  Sensor\  is  the  following. 

c/assli(yo  ...  J/s)  (V^o . yj  ((yo  = /i(Cro)  A  ...  A  j/5  = /i(Cr5))  =4> 

(1  <  J/o  <  2/2  <  J/i  A  1  <  ya  <  ys  <  y-i)),  (69) 

class^liixjQ  ...  2/5)  )'  (^yoi  —  ij/s)  ((yo  “  /iC^ro)  A  ...  ^Ps  —  fiiOn)) 

(1  <  yo  <  yi  A  yo  <  y2  A  1  <  ys  <  y4  <  ya)),  (70) 

c/a553i(yo  ...  ys)  (Vyo,...,y5)  ((yo  =  /l(C'ro)  A  ...  A  J/5  =  /l(C'rs)) 

(0  =  yo  <  yi  <  ya  A  1  <  ya  <  y^  <  ya)),  (71) 

class4:i[yo  ...  ys)  4=^  (^yo,  —  .ys)  ((l/o  =  /i(^ro)  A  ...  A  ys  = 

(1  <  yo  <  y2  <  yi  a  ya  =  y4  =  ys  =  0)),  (72) 

Cro  <  a,  <  a,  <  a,  <  a,  <  a,.  (73) 

3.  Model  Ml 

A  feature-level  model  Mi  of  the  theory  Ti  is  defined  as 

Ml  =<  Ai,/i  >,  (74) 

or  as 

Ml  =<  Ai;  c/cssli,  c/ass2i,  c/ass3i,  c/ass4i, /i,0, 1, . . . ,  5; /i  >,  (75) 

where 

•  Ai  =  {0, . . .  ,5}  is  a  universe  of  the  model  Mi, 

•  classli,class2i,classZi,classii  C  Af  are  relations, 

•  fi  '•  Ai  — »  Ai  is  a  function, 

•  Ii  is  an  interpretation  function  that  maps  symbols  of  the  language  Li  to  appropriate 
relations,  functions  and  constants  in  the  universe  Ai.  It  assigns  relations  classli  C  Af, 
class2i  C  A®,  class3i  C  A®,  and  classAi  C  A®  in  the  model  Mi  to  symbols  classli, 
class2i,  classZi,  and  classii  in  the  language  Li  respectively.  We  use  c/assli,  class2i, 
classZi,  and  c/ass4i  to  denote  both  the  relation  symbols  in  Li  and  the  relations  in  Mi.  Ii 
also  assigns  a  function  /i  :  Ai  — >  Ai  in  the  model  Mi  to  a  symbol  fi  in  the  language  Li. 
We  use  /i  to  denote  both  the  function  symbol  in  Li  and  the  function  in  Mi.  And  finally, 
Ii  assigns  constants  0, . . . ,  5  in  the  model  Mi  to  the  constant  symbols  Ctq  , . . . ,  Cr^  in 
the  language  Li  respectively  and  special  constants  0, 1  in  the  model  Mi  to  the  constant 
symbols  0, 1  in  the  language  Li. 

The  ni  =  6  selected  features  from  the  most  discriminant  basis  (MDB)  using  the  domain  theory 
are  shown  in  Fig.  33.  Fig.  34  shows  the  relationship  between  the  selected  features  and  symbolic 
target  features  in  ideal,  not  noisy  conditions.  All  six  features,  depending  on  their  value  and  position, 
carry  the  information  about  the  type  of  symbolic  target  features  and  their  position  in  a  target 
signature.  The  first  three  features  correspond  to  symbolic  features  in  the  beginning  part  of  a  target 
signature,  while  the  last  three  features  correspond  to  symbolic  features  in  the  middle  part  of  a 
target  signature.  In  particular: 
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Figure  33:  Features  Selected  From  the  MDB  for  the  DTi  Database  Using  the  Domain  Theory 

•  the  value  of  the  first  feature  indicates  the  existence  of  a  zero  edge  characteristic  for  target  sig¬ 
natures  belonging  to  classS  or  the  existence  of  a  rising  edge  characteristic  for  target  signatures 
belonging  to  classl,  class2,  and  dassA  (see  Fig.  34); 

•  the  value  of  the  second  feature  indicates  the  existence  of  a  peak  characteristic  for  target  signa¬ 
tures  belonging  to  classl  or  the  existence  of  a  rising  edge  characteristic  for  target  signatures 
belonging  to  clas$2,  classZ  and  classA  (see  Fig.  34); 

•  the  value  of  the  third  feature  indicates  the  existence  of  a  rising  edge  characteristic  for  target 
signatures  belonging  to  class2  and  classZ  or  the  existence  of  a  falling  edge  characteristic  for 
target  signatures  belonging  to  classl  and  classA  (see  Fig.  34); 

•  the  value  of  the  fourth  feature  indicates  the  existence  of  a  rising  edge  characteristic  for  target 
signatures  belonging  to  classl,  the  existence  of  a  zero  edge  characteristic  for  target  signatures 
belonging  to  classA,  or  the  existence  of  a  falling  edge  characteristic  for  target  signatures 
belonging  to  class2  and  classZ  (see  Fig.  34); 

•  the  value  of  the  fifth  feature  indicates  the  existence  of  a  peak  characteristic  for  target  signatures 
belonging  to  classl,  the  existence  of  a  zero  edge  characteristic  for  target  signatures  belonging 
to  classA,  or  the  existence  of  a  falling  edge  characteristic  for  target  signatures  belonging  to 
class2  and  classZ  (see  Fig.  34); 

•  the  value  of  the  sixth  feature  indicates  the  existence  of  a  falling  edge  characteristic  for  target 
signatures  belonging  to  classl,  class2  and  classZ,  or  the  existence  of  a  zero  edge  characteristic 
for  target  signatures  belonging  to  classA  (see  Fig.  34). 

The  features  selected  from  the  MDB  using  the  theory  Ti  are  different  from  the  Most  Discriminant 
Wavelet  Coefficients  (MDWC)  which  are  shown  in  Fig.  35. 

In  the  same  manner  cis  for  the  Sensori,  we  define  an  appropriate  language,  a  model,  and  a  theory 
for  Sensor2.  The  symbol  <  is  again  cissumed  to  be  a  logical  symbol  with  the  usual  interpretation 
as  a  linear  order  relation  on  the  universe. 
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Figure  34:  Relationship  Between  the  Selected  Features  and  Symbolic  Target  Features  for  the  DTi 
Database 

1.  Language  L2 

A  feature-level  language  L2  for  Sensor2  is  the  following: 

I2  =  {c/assl2,c/ass22,c/ass32,c/ass42,/2,0, 1,2,  C,o,C, C", >,(7, (76) 


where 

•  classl2,class22.,class32,class4:2  are  9-placed  relation  symbols  (classes  of  targets  to  be 
recognized  using  Sensor 2), 

•  f2  is  an  1-placed  function  symbol  (a  function  that  maps  features’  indices  into  features’ 
values), 

•  0,1,2  are  constant  symbols, 

•  Cig,  Ci^,. . . ,  Cij,  Ci^  are  constant  symbols  (features’  indices). 

2.  Theory  T2 

A  feature- level  theory  T2  for  Sensor 2  is  the  following. 

classhiyo  ...  ys)  (Vj,g, ... ,„,)  ((?/o  =  /2(C',o)  A  ...  ^ys  =  f2{Ci^)) 

(2  <  j/o  <  2/2  <  J/i  A  2  <  t/5  <  2/3  <  t/4  A 

2  <  2/6  <  yr  A  2  <  2/8  <  yr)),  (77) 

class22{yo  ...  ys)  > — r  (^yo....  .vs)  ((yo  = /2(C',o)  A  ...  A  ys  = /2(C,g))  => 

(1  <  y2  <  yi  <  yo  A  2  <  ya  <  y4  <  ys  A 

y?  <  ye  <  1  A  ys  =  0)),  (78) 

class32{yo  •••  ys)  ^ ^  (^yo>  — .yg)  ((yo  =  f^iCio)  A  •••  A  ys  =  /2(C’,g)) 

(2  <  yo  =  yi  A  y2  <  yi  A  1  <  ys  <  y4  <  ys  A 


66 


7.5 


Normalized  Frequency 

Figure  35;  Most  Discriminant  Wavelet  Coefficients  (MDWC)  Selected  from  the  MDB  for  the  DT\ 
Database  as  Features 


2  <  ye  <  t/7  A  2  <  ys  <  yr)),  (79) 

c/ass42(yo  ...  ys)  4=^  (Vj,^ . yj  ((yo  = /2(C'.o)  A  ...  A  yg  = 

(yo  =  yi  =  y2  =  ye  =  yr  =  ys  =  0  a  2  <  yg  <  y4  A 
2  <  ys  <  y4)),  (80) 

Ci,  <  Ci,  <  Ci,  <  Ci,  <  Ci,  <  Ci,  <  Ci,  <  Ci,  <  Ci,.  (81) 

3.  Model  M2 

A  feature-level  model  M2  of  the  theory  T2  is  defined  as 


M2  =<  A2J2  >, 

(82) 

or  as 

M2  =<  A2]  classl2,  class22,  class32,  class^2, 72?  0, 1, . . . ,  8;  /2  >, 

(83) 

where 

•  i42  =  {0, . . . ,  8}  is  a  universe  of  the  model  M2, 

•  classl2,class22,classZ2,classA2  C  Aj  are  relations, 

•  /2  :  A2  ^  A2  is  a  function, 

•  I2  is  an  interpretation  function  that  maps  symbols  of  the  language  L2  to  appropriate 
relations,  functions  and  constants  in  the  universe  A2.  It  assigns  relations  class\2  C  A®, 
class22  C  Aj,  classZ2  C  A|,  and  class4i2  C  A®  in  the  model  M2  to  symbols  classl2, 
class22,  class32,  and  classA2  in  the  language  L2  respectively.  We  use  classl2,  class22, 
classZ2,  class^2  to  denote  both  the  relation  symbols  in  L2  and  the  relations  in  M2,  h 
also  assigns  a  function  /2  :  A2  A2  in  the  model  M2  to  a  symbol  /2  in  the  language  L2. 
We  use  /2  to  denote  both  the  function  symbol  in  L2  and  the  function  in  M2.  And  finally, 
I2  assigns  constants  0, . . . ,  8  in  the  model  M2  to  the  constant  symbols  C,,, , . . . ,  C,-g  in  the 
language  L2  respectively  and  special  constants  0,1,2  in  the  model  M2  to  the  constant 
symbols  0, 1,2  in  the  language  L2. 
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Figure  36:  Features  Selected  From  the  MDB  for  the  DT2  Database  Using  the  Domain  Theory 

The  n2  =  9  selected  features  from  the  MDB  using  the  domain  theory  (symbolic  knowledge  about 
the  target  signatures  in  the  DT2  database)  are  shown  in  Fig.  36.  More  details  on  feature  selection 
for  Sensor 2  can  be  found  in  [9]. 

1.  Fused  Language  Lj 

A  feature- level  fused  language  £-/  is  the  following: 

Lf  =  {classl,class2,class3,class4:,ff,0, 1,2, C/g,  C/, , . .  •  ,C'/,,C/g},  (84) 

where 

•  classl,class2,class3,classi  are  9-placed  relation  symbols  (classes  of  targets  to  be  rec¬ 
ognized  using  fused  data), 

•  fj  is  an  1-placed  function  symbol  (a  function  that  maps  features’  indices  into  features’ 
values), 

•  0, 1,2  are  constant  symbols, 

•  C'/o,  C/j , . . . ,  C/j,  C/g  are  constant  symbols  (features’  indices). 

2.  Fused  Theory  Tj 

A  feature- level  fused  theory  Tf  is  the  following. 

c/assl(j/o  -  ys)  (Vj,g . j,g)  {{yo  =  //(C/g)  A  ...  A  ys  =  => 

(1  <  J/o  <  y2  <  yi  A  1  <  1/3  <  J/s  <  2/4  A 
2  <  ye  <  y?  A  2  <  ys  <  yj)),  (85) 

class2{yo  ...  ys)  < — >  -  .m)  ((j/o  = //(C/o)  A  ...  A  ys  = //(C/g))  =4* 

(1  <  yo  <  yi  A  yo  <  ya  A  1  <  ys  <  y4  <  ya  A 
y?  <  ye  <  1  A  ys  =  0)),  (86) 

class3{yo  ...  ys)  (Vj,g, ...  ,y,)  {{yo  =  //(C/g)  A  ...  A  ys  =  //(C/g)) 
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(0  =  yo  <  yi  <  2/2  A  1  <  2/5  <  2/4  <  2/3  A 
2  <  2/6  <  2/T  A  2  <  2/8  <  2/7)), 

class4{yo  ...ys)  ^  (Vyo.  ....j/g)  ((yo  = //(C/o)  A  ...  A  2/8  = //(C/,)) 

(1  <  yo  <  y2  <  yi  A  j/3  =  1/4  =  2/5  =  ye  =  yr  =  y8  =  0)),  (88) 

C/o  <  Cy,  <  Cj,  <  Cj,  <  Cu  <  Cs,  <  Cu  <  Cj,  <  Cj,,  (89) 

where 

C/o  =  Ctq  a  C/j  =  Cr,  A  C/j  =  Ct2  a  C/3  =  Cra  A  C/^  =  Cr4  A 

C/j  =  Crj  A  C/g  =  Cig  A  C/y  =  Cii  A  C/g  =  C,g ,  (90) 

//(C/.)  =  /.(a.)  A  //(c,,)  =  /.(c.,)  A  =  /.(Cr.)A 

//(C/.)  =  (/.(C.,)  A  =  /.(C.,)  A  =  h(C,,)  A 

fAC,.)  =  h(Ci.)  A  //(C,.)  =  h(Ci,)  A  //(C/,)  =  h(Ci,).  (91) 

3.  Fused  Model  Mj 

A  feature-level  fused  model  M/  of  the  theory  Tj  is  defined  as 

Mf=<A,Ij>,  (92) 

or  as 

Mf  —<  A;  c/assl,c/ass2,  c/ass3,c/ass4,//,  0, 1, . . . ,  8;  7/  >,  (93) 

where 

•  A  =  {0, . . . ,  8}  is  a  universe  of  the  model  Mj, 

•  classl  C  A®  is  a  relation,  where  classl  —  classli^  x  classli  and  c/assl^  =  c/assl2n  (AH 

{6,7,S})^ 

•  class2  C  A®  is  a  relation,  where  class2  =  class2i  x  cla$$2'2  and  class2'2  =  class22  fl  (An 

{6,7,8})^ 

•  classS  C  A®  is  a  relation,  where  classS  =  class3i  x  classS2  and  class^2  =  classS2  n  (AH 

{6,7,8})^ 

•  classi  C  A^  is  a  relation,  where  class4  =  c/ass4i  x  class42  and  class42  =  class42  n  (A  n 

{6,7,8}f, 

•  //  :  A  A  is  a  function,  where  //  =  /j  U  and  =  /2  Un{6,7,8}, 

•  If  is  an  interpretation  function  that  maps  symbols  of  the  language  Lj  to  appropriate 
relations,  functions  and  constants  in  the  universe  A.  It  assigns  relations  classl  C  A^, 
class2  C  A®,  classZ  C  A®,  and  class4  C  A^  in  the  model  Mj  to  symbols  classl,  class2, 
class3,  and  class4  in  the  language  Lj  respectively.  We  use  classl,  class2,  classl,  class4 
to  denote  both  the  relation  symbols  in  X/  and  the  relations  in  Mf.  If  also  assigns  a 
function  //  :  A  A  in  the  model  Mf  to  a  symbol  //  in  the  language  Lf.  We  use  //  to 
denote  both  the  function  symbol 'm  Lf  and  the  function  in  Mf.  And  finally,  7/  assigns 
constants  0, ...  ,8  in  the  model  Mf  to  the  constant  symbols  C f^,. . .  ,C  f^  in  the  language 
Lf  respectively  and  special  constants  0,1,2  in  the  model  Mf  to  the  constant  symbols 
0, 1, 2  in  the  language  Lf. 
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The  fused  theory  and  fused  model  are  used  to  fuse  the  feature  vector  consisting  of  ni  =6 
features  from  the  DT\  database  and  the  feature  vector  consisting  of  n2  =  9  features  from  the  DT2 
database  into  one  fused  feature  vector  consisting  of  n/  =  9  interpretable  features.  In  particular: 

•  the  first  three  fused  features  of  this  fused  feature  vector  are  the  first  three  features  selected 
for  Sensor\,  whose  value  indicates  the  existence  of  the  first  peak  characteristic  for  target 
signatures  belonging  to  classl  and  classA,  the  existence  of  a  rising  edge  characteristic  for 
target  signatures  belonging  to  class2,  or  the  existence  of  a  zero  edge  characteristic  for  target 
signatures  belonging  to  classS] 

•  the  next  three  fused  features  are  the  last  three  features  selected  for  Sensori,  whose  value 
indicates  the  existence  of  the  second  peak  characteristic  for  target  signatures  belonging  to 
classl,  the  existence  of  a  falling  edge  characteristic  for  target  signatures  belonging  to  class2 
and  classS,  or  the  existence  of  a  zero  edge  characteristic  for  target  signatures  belonging  to 
classA; 

•  the  last  three  fused  features  of  this  fused  feature  vector  are  the  first  three  features  selected  for 
Sensor 2,  whose  value  indicates  the  existence  of  a  peak  characteristic  for  target  signatures  be¬ 
longing  to  classl  and  classZ,  or  the  existence  of  a  zero  edge  characteristic  for  target  signatures 
belonging  to  class2  and  classA. 

6. . 5  Results  of  Experiments 

We  test  the  recognition  accuracy  of  the  AMFRS  by  using  two  test  databases  with  target  signatures 
corresponding  to  the  DT\  and  DT2  reference  databases.  Each  of  the  target  signatures  in  these 
databases  is  transformed  into  the  wavelet  domain  using  the  DWPD.  Then,  we  select  one  target 
signature  from  each  of  the  test  databases  in  such  a  way  that  these  two  target  signatures  correspond 
to  the  same  target.  A  framework  of  model  theory  (SDFS)  is  used  to  extract  features  from  each  of 
these  two  target  signatures,  and  to  fuse  them  into  one  combined  feature  vector.  This  feature  vector 
is  used  as  an  input  vector  into  a  backpropagation  neural  network.  The  backpropagation  neural 
network  arrives  with  a  final  recognition  decision. 

Fig.  37  shows  the  resulting  misclassification  rates  for  different  levels  of  noise  for  the  multisensor 
(using  the  DTi  and  DT2  target  signatures)  target  recognition  problem.  This  figure  also  shows  the 
misclassification  rate  of  the  target  recognition  system  using  Most  Discriminant  Wavelet  Coefficients 
(MDWC)  as  features.  The  misclassification  results  show  that  the  AMFRS  has  a  better  recognition 
accuracy  than  a  MDWC  based  target  recognition  system. 

6. . 6  Conclusions  and  Future  Research 

A  multisensor  recognition  methodology  (AMFRS)  is  proposed  based  on  features  selected  by  uti¬ 
lizing  the  Best  Discrimination  Basis  Algorithm  (BDBA),  symbolic  knowledge  about  the  domain, 
and  a  model-theory  based  fusion  framework.  The  symbolic  knowledge  about  the  domain  is  incor¬ 
porated  into  the  AMFRS  through  a  domain  theory  using  a  model-theory  based  fusion  framework. 
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Figure  37:  Misclcissification  Rates  for  the  Multisensor  Target  Recognition  Using  Model  Theory 
(AMFRS)  and  Most  Discriminant  Wavelet  Coefficients  (MDWC)  for  Feature  Selection 

A  model  of  a  domain  theory  represents  a  bridge  between  a  symbolic  knowledge  about  the  domain 
and  measurement  data. 

The  simulation  results  show  that  the  AMFRS  has  a  better  recognition  accuracy  than  a  MDWC 
based  recognition  system  with  respect  to  the  test  databases.  The  MDWC  based  target  recognition 
system  adapts  well  to  the  training  set  of  target  signatures,  but  it  does  not  have  enough  generalization 
power  to  perform  equally  well  with  the  test  data.  The  MDWC  used  to  train  a  classifier,  in  the 
discussed  recognition  problem,  are  concentrated  only  in  one  time/frequency  area.  We  are  able  to 
increase  the  generalization  power  of  the  target  recognition  system  by  selecting  interpretable  features 
which  are  more  spread  across  the  time/frequency  domain  and  which  capture  more  of  symbolic  target 
features.  To  select  such  interpretable  features,  we  utilize  the  symbolic  knowledge  (fused  theory) 
specific  to  the  recognition  problem  domain. 

In  this  work,  we  build  the  domain  theory  manually  based  on  the  analysis  of  the  target  signatures 
in  the  reference  database  (symbolic  knowledge)  and  the  most  discriminant  bases  determined  by  using 
the  BDBA.  In  the  future  research  we  are  going  to  incorporate  learning  capabilities  into  the  AMFRS. 
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